DOCOHENT 8ES0HE 



BD 106 327 



32 



TH 004 442 



AOTHOB 
TITLE 



INSTITOTION 
SPOMS AGENCY 



POB DATE 
NOTE 



EDRS PRICE 
DESCBIPTORS 



IDENTIFIERS 



Darost, Halter N. 

A Description and Evaluation of the Statewide Testing 
Program ir New Hampshire in 1S68-69 and 1969-70 Onder 
the SponsOiTship of Title I and the Significance of 
the Data Obtained For Evaluation with this 
Activity. 

New Hampshire State Dept. of Education, Concord. 
Bareaa of Elementary and Secondary Education 
(DHEW/OE) , Washington, D.C. Div. of Coapensatory 
Education. 
Mar 71 

84p. ; Document not available in hard copy due to 
marginal legibility of original document; Product of 
the Test Service and Advisement Center, Lee, N.H. 

HF-$0.76 HC Not Avail?»ble from EDRS. PLOS POSTAGE 
Academic Achievement; Achievement Tests; Comparative 
Testing; *Compensatory Education Programs; Group 
Intelligence Tests; Intelligence; >^Program 
Evaluation; Program Improvement; School Districts; 
♦State Programs; *Testing Programs; Test 
Interpretation; *Test Results 

Elementary Secondary Education Act Title I; ESEA 
Title I; ^VeM Hampshire; New Hampshire Statewide 
Testing Program; State Testing Programs 



ABSTRACT 

The New Hampshire statewide testing program was 
implemented to provide a data base for the evaluation of the 
effectiveness of Title I projects as required by Federal law. To 
accomplish this objective, achievement and intelligence tests were 
administered to children in Title I projects and regular programs in 
four elementary grades— 2, 4, 6 and 8. Thus the performance of 
children in both programs could be analyzed and compared. The 
information collected during the 1968-69 program was used as a basis 
for modifying and improving the 1969-70 program. Test results, 
statewide analysis and interpretation of the data are presented. 
CEVH) 



ERIC 



A DESCRIPTION AND EVALOATION 
OF 1HE STATEWIDE TESTING PROGRAM IN NEW HAMPSHIRE 
IN 1968-69 AND 1969-70 
UNDER THE SHWSORSHIP OF TITLE I 
AND THE SIGNIFICANCE OF THE DATA OBTAINED 
FOR EVALnATK)N WITH IHIS ACTIVmr 



PREPARED BY: WALTER N. DUROST> Ph.D., DIRECTOR 
TEST SERVICE AND ADVISEMENT CENTER 



us OEPA«TMENTO»^ HEALTH 
EDUCATION *weLFA«E 
NATIONALlHSTlTllT£OF 
EDUCATION 

c-ocuM^^4T ►.AS BttN «ep«c 

^MP P%:nNOeOkOANi/ATlONOKlG.N 
- NG .T POiNT^O- v.fV^ OK OPINIONS 

rOUCATlCN PO&iT.ON OR POUCV 



Completed under contract of March 13, 197 J.. 



New Hampshire Statewide Testing Program Evaluation 
1968-69 and 1969-70 



List of Contents 

SECTION I - Introduction, Federal Law regarding selection of students for Title I 
projects, and the 1968-69 testing program in New Hampshire 

SECTION II* Steps taken to improve the 1969-70 Statewide testing program and the 
results of these efforts - Photocopy of "W and "OUT" Cards 

SECTION III 

Part A - Chronological Age distribution in New Hampshire compared with the 
National group 

Table III-A-I - Distribution of Chronological Ages separately by sex and 
for the Total Group^ Statewide Pall 1969, axid ALP Norm 
Group, Nationwide Fall 1967 - Grade A 

Table III-A*2 - Same as above for Grade 6 

SECTION III 

Part B - Evaluating the Measured Mental Ability of Kew Hampshire students 
against National Norms 
Table III-B*1 - Distribution of Otis-Lennon DIQs separately by sex and 

for the Total Group - Fall 1969 - Grades A and 6 
Chart III-B-1 * Bivariate Chart showing the degree of correspondence between 
paired scores - Stanford Paragraph Meaning vs Otis-Lennon - 
Total Group - Fall 1968 - Grade A 
Table III-B-2 - Correlations - Otis Lennon and Stanford Tests - Total Group - 
Fall 1968 - Grades A and 6 



Pane 
1 



10 
11 



lA 
15 

17 
17 



SECTION III 

Part C - Overall Description of the New Hampshire population as regards Achievement 
Table III-C-1 - Equivalent Grade Scores " Stanford and Metropolitan - for 

25th, 50th and 75th Percentile Ranks - Total Group - Fall 

1969 - Grades A and 6 
Table CII-C*2 - Analysis of SAT and OliiAT Characteristics relating to "Goodness 

of Fit" of each test for the level at which it wajs used - 

Total Group and Random Sample - Fall 1969 - Grades A and 6 

SECTION III 

Part D * Variation among Communities 

Table III-D-1 * Frequency and Cumulative Percent distributions of linear 

standard scores corresponding to 137 School District Means - 
0U1AT and selected Stanford subtests - Fall 1969 - Grade A 
Table III-D-2 * Same as above for Grade 6 

Table III-D-3 - Intercorrelations of District Means in standard score form - 

OIXAT and selected Stanford subtests - Fall 1969 - Grades A & 6 

Figure III-D-1- Sample Profile of 3 New Hampshire School District Means - 
Fall 1969 - Grade A 



SECTION IV- 



Description of the Title I Population from the "IN" and "OW Cards 

Number and percent of cases enrolled in each Title I project 
category - 1969-70.- Grades 2, A, 6 and 8 
Distribution of hours of instruction for Title I pupils - 
1969-70 - Grades A and 6 

Number and percent of pupils by type of instructional personnel 
involved - 1969-70 Grades A and 6 

Number and percent of 1969-70 Title I pupils who were in Title 
projects in 1968-69 school year - Grades A and 6 
Entry date for children in the 1969-70 Title I Program (Tested 
Sample) - Grades A and 6 

Duration of Title I experience for all available cases tested 
in the Spring of 1970 - Grades A and 6 

Reason for termination of participation in 1969-70 Title I 
project - Grades A and 6 

Teacher Judgment concerning the success of the Title I program 
1969-70 - Grades A and 6 



Table 


IV- 1 


Table 


iy-2 


Table 


IV-3 


Table 


IV-A 


Table 


IV-5 


Table 


IV-6 


Table 


IV-7 


Table 


IV-8 



18 

21 

22 

23 

25-27 
28-30 

31 

32 

33 

36 

36 

37 

37 

38 

38 

39 

39 



ERIC 



3 



N. H. Statevida Testing Progran Evaluation - List of Contents (Continued) 



Page 
40 



SECTION V * The Random Sample - The Need for a Random Sample testing program, and 
Determining the representatlvenesa of the tested Random Sample 
Chart V-1 - Normal Percentile Chart - Distribution of OIHAT DIQs - Total 

Group and Random Sample • Fall 1969 « Grade 4 
Chart V-2 * Same as above for Grade 6 

Table V-I - Raw Score comparisons * Total Group and Random Sample * Fall 

1969 - Grades 4 arJ 6 
Table V-2 * Comparison of Means and Standard Deviations * Total Group and 

Random Sample - Fall 1969 - Grades 4 and 6 ^ 



SECTION DC - A Personal Coanentary 



42 
43 

44 



SECTION VI - The tested Title I population In New Hampshire described and compared 
with the Random (Representative) Sample 
Table VI-l -Distribution of Chronological Ages separately by sex and for 

the Total Group - Random Sample and Title I - Fall 1969 - Grade 4 46 
Table VI-2 -Same as above for Grade 6 

Table VI-3 -Distribution of Otls-Lennon DIQs separately by sex and for the 

Total Group - Random Sample - Fall 1969 - Grades 4 and 6 ^ 
Table VI -4 -Same as above for Title I 



51 
53 



SECTION VII- Single-Variable comparisons of Fall-Spring performance for the Random 
Sample and for Title I cases 
Part A - Some basic measurement problems 

Part B - Comparisons In raw scores and grade equivalents for the Random Sample 
and Title I cases 

Table VII-B-l - Percentiles corresponding to selected percentile ranks 

with corresponding Stanford and Metropolitan grade equivalents - 
Random Sample - 1969-70 - Grades 4 and 6 54 

Table VII -B-2 - Same as above for Title I 55 

Table VII-B-3 - Expected change In selected Stanford scores over 7 months 

of In-school Instruction - Grades 4 and 6 57 

Table VII-B-4 - Fall and Spring raw score means, standard deviations and 

gains - Random Sample and Title I - 1969-70 - Grades 4 & 6 58 

Table VII-B-5 - Comparison of Fall and Spring gains Involving the Median 
for Title I vs the 25th percentile for the Eandom Sample - 
1969-70 - Grades 4 and 6 59 

SECTION VIII-Blvarlate conparlson of Fall-Spring performance for Randc:^! Sample and 
Title I cases 

Part A - Blvarlate distributions as a means of comparing Fall and Spring test results 60 
Chart VIII-l - Stanlne blvarlate charts showing the relationship between 
Fall and Spring results for selected Stanford Achievement 
Tests - Random Sample and Title I - 1969-70 - Grade 4 - 
Word Meaning ^2 
Chart 'viII-2 - Stanlne blvarlate as abovn - Grade 4 - Paragraph Meaning ^ 
Chart VIII-3 - Stanlne blvarlate as above - Grade 4 - Arithmetic Computation 
Chart VIII-4 - Stanlne blvarlate as above - Grade 4 - Arltha^tlc Concepts 65 
Chart VIII-5 - Stanlne blvarlate as above - Grade 4 - Arithmetic Applications 66 
Chart VIII-6 - Stanlne blvarlate as above - Grade 6 - Word Meaning ^7 
Chart VIII-7 - SUnlne blvarlate as above - Grade 6 - Paragraph Meaning 
Chart VIII-8 - Stanlne blvarlate as above - Grade 6 - Arithmetic Computation 
Chart VIII-9 - Stanlne blvarlate as above - Grade 6 - Arithmetic Concepts 
Chart VIII-IO- Stanlne blvarlate as above - Grade 6 - Arithmetic Applications '1 
Part B - Data concerning the Measures of Relationship between tests administered 
In the Fall and repeated in the Spring 
Table VIII-B-l*Correlatlons between selected Stanford subtests administered 
In the Fall and repeated In the Spring In comparison with 
reported reliability coefficients - RS & T.I - l969-70-Gr.4&6 74 
Table VIII-B-2-Intercorrelatlons of selected Stanford subtests - Random 

Sample and Title I - Fall 1969 - Grades 4 and 6 75 
Table VIII-B-3-Same as above - Spring 1970 76 
Table VIII-B-4-Means , Standard Deviations, and Correlation Coefficients- 

Random Sample and Title I - Fall vs Spring - 1969-70 - Grades 4M 77 



68 
69 
70 



78 
80 



APPENDIX A - Intercorrelatlons of SAT and OIK^T - Total Group - Fall 1968 - Grades 4 & 6 
APPENDIX B - Correlation bet;jH!een OlMAT and SAT - from OIMAT Technical Handbook 81 



ERIC 



A Description and Evaluation 
of the STATEWIK TESTING PROGRAM in NEW HAMPSHIRE 
in 1968-69 and 1969-70 
Under the Sponsorship of Title I 
Aiid the Significance of th^. DaCo Obtained 
For Evaluation With This Activity 



SECTION I 



^.ntroJuctlon 

Research is a dirty word to many and an an- 
biguous word to those vfao endeavor to carry out 
activities so naned. Was Edison doing research 
when he was experiiaenting with filaaient materials 
for the electric light bulb? The writer feels 
that he was. His efforts represented a planned 
attack on the problem with more or leas precise 
specifications of the capabilities of the material 
being sought. His efforts, however »j could never 
have been subject to PEST analysis ahd he never 
vould have gotten a government contract. 

In many vays this study faced the same di* 
lenma. All involved had a fairly clear picture of 
vhat needed to be done; namely, to a use of ob- 
jective testing and data collection p.ocedures 
plus appropriate analysis techniques to arrive at 
a value judgment as to the **goodne3s" of the Title 
I effort in this state. This objective was hedged 
about by many restrictions of repudiation » adminis- 
tration and philosophy, some of which were almost 
directly incosepatible with the goal stated above. 
Much was done toward reaching the goal, much more 
could have been done under other circumstances* 
The following pages constitute a real effort to 
contend with all the difficulties and present even- 
tually some kind of meritorious report. A good 
place to start the report is with a statement of 
the purposes and procedures in setting the Title I 
evaluation prograza in motion in New Hampshire. 

Provisions of the Fedaral Lay as Regards the 
Selection of Students for Title 1 Projects 

Perhaps nome information at this point con- 
cerning the development and implementation of 
Title I as a major part of BSEA will provide a back- 
ground which will clarify Boae of our problems in 
regard to the subsequent data analysis. 

Money for Title I programs is allocated to 
each state in terms of the nuaber of families » 
county by county, falling below the national pov- 
erty level plus some other conaidarationa, auch aa 
nutter of familiaa racaivlng Aid for Dapandant 
Children and number of children in foata; howaa. 
All these dau are uaad, along with aiailar infor- 
mation from the other atataa, to datandaa tte pro- 
portion of amilabla ftmda to aaw imta tUa atata. 
HoMvar* tha tltla I office within Ika Itata BaH^t- 



ERLC 



ment of Education determines the distribution of 
money by school district as P:,alnst county, again 
being guided by the economic considerations as 
listed above. In other words, the State Depart- 
ment's allocation of funds to a district Is 
strictly in accordance with the statiatlcs concern- 
ing the nurher of families qualifying in that dis- 
trict as defined above. 

School districts in New Hampshire, like all 
other states in the country, are required by Title 
I regulations to designate target schools except 
under certain conditions. Basically, this provi* 
slon is Inoperative in New Hampshire in a majority 
of the school districts because there exists only 
a single attendance area. In the larger cities, 
auch as Dover, Portsmouth, and the like where there 
are multiple attendance areas, certain schools are 
designated as target schools. In such instances* 
Title I projects are confined solely to these 
schools. 

Subaequently» each school district is respon- 
sible for submitting projects to the State Depart- 
ment Title I office for approval as a basis for the 
specific allocation of monies to fund these proj- 
ects. 

Each school district proposal is expected to 
state very explicitly the grades Involved} expense 
for personnel and material, and, finally, the method 
to be used to evaluate outcomes. Such evaluation 
la mandatory according to Federal regulations. 

These Federal regulations favor objective test- 
ing as the primary basis for evaluation and make 
some further stipulations in regard to progress as 
measured in grade equivalents i which make no sense 
from a measurement point of view. 

Obviously I vfaan a vary large proportion of the 
students In Title I at all grades are involved in 
reading projects, it mates sense to use a reading 
teat as part of an evaluation and possibly even as 
the basic inatrument for aasessing outcomes. This 
heavy enqshasia on^reading. Incidentally, is a rather 
general characteristic of Title I projects through- 
out the coimtry. 

The U. S. Office of Education reflects this 
emphasis by its recently activated project for 
equating the major reading tests avilable through 
comercial channels and reatandarditlng one of them 
(Matropolltan) aa an anchor teat. 

School dlatricta have the option of selecting 
the teata to be uaad within the diatrict for evalu- 
ation of their own Title I programs. Some choose 
to use teata consiatent with thoae used throughout 
the achool diatrict for ell pupila) even if thaae 
are not parti culerly aul table for the purpoaa. 
Conaquantly, the nuaber of Title I children tested 
Fell and Spring in New Hampahire with the state > 
deaignated teata la aubatantially lower than the 
total eyafrer of children in Title I in the reapec- 
tive greiee. Thie wiet be considered an axerclae 



5 



N*U. Statewide Testing Program Evaluation - I 



of dcaocracy at the expanae of the coHon good. 

As a natter of feet, the proportion of Title I 
students in ths tested Spring ss^le is only 
sligutly aore than SOW of knoim Title I cases, i«e., 
cases for vfaos both "Hi" and "OOT" cards are avail- 
able. ?or exmple^ In Grade 4, about 100 pupils 
were enrolled in Title I projects in large dis- 
tricts which chose HOT to test in the statewide 
testing progrsB. 

In addition to the attrition due to the fail- 
ure of a district to take part in the state program 
at all, account must be taken of the difficulties 
involved in matching Fall and Spring tested cases t 
as described elcewhere, as well as other causes of 
attrition which are hard to assess as to total 
effect* This Includes such factors as absence from 
school for one test or another or for one sejant 
oC the battery administered in the Fall or in the 
Spring. Tested caaes used in this analysis in- 
cluded only those for whom we had test Informatioo 
on the five skills tests used; namely. Word Meaning 
Paragraph Meaning, Arithmetic Computation, Arith- 
metic Concepts und Arithmetic Applications. 

Another serious source of error is in the 
failure of some school districts to test in the 
Spring even though they did participate in the 
Fall testing program. 

Ihe analysis in Section IV is based upon 
completely tested cases only, matched for Fall and 
Spring testing. Only in Table IV-1 of this section, 
concerned with determining the percent of children 
in each category of Title I projects, does the re- 
port utilize the complete set of **IH** cards. 

From all of the above, it must be a^ply clear 
that It would be very difficult to defend the Title 
I completely tested sample as being representative 
of all Title I cases In the State at any grade 
level. It would be difficult even to defend the 
sample statistically as being represent tative of 
those pupils In academically oriented programs or, 
even more specifically, in corrective and remedial 
programs in reading. 

On logical and experiential grounds » it SEEMS 
representative. Even if it is not precisely so, it 
still constitutes a viable population of Title I 
pupils, defined by all the deacriptive informatioa 
reported here. It is therefore qtiite legitimate to 
report for this dafined population the results of 
the analysis of the "IN" and "Oirr" card data. At 
least the two aspects of the analysis are compare 
able. 1/ 

One important fact concerning the use of Title 
I funds is often overlooked. These monies are not 
intended to defray expenaea ordinarily provided for 



1./ Tbasa are the questionnaire aspect veraua the 
^ statistical aapect. 



in the regular school budget. For the sake of 
illustration, let us assume that we are discussing 
an urban comunity wxrh designated targat schools. 
One or more such target schools may be receiving 
money for a remedial read.lng program but OTHER NON- 
TABGET SCHOOLS may be just as needful of these 
services. Still the district cannot use Title I 
funds to finance a similar remedial reading progrem 
in these other schools, even thou^ badly needed, 
because such schools do not satisfy the criteria 
for Title I aid established in the Law and enabling 
regulstions. 

Considerations such as this highlight the im- 
portance of the "IN** end **OUT** cards as basic con- 
trol documents. Test results relating to Title I 
pupila in this study are reported only for docu- 
mented Title I cases. It is unfortunate that up to 
50Z of these Title I cases, so identified, were not 
tasted for reasons specified above. More might 
have been tested if some coercion had been used« 
Ihe Title I office did not wish to be put in the 
position of dictating the method of evaluation (I.e., 
by means of a particular standardised test) to be 
used for all projects, especially if the project, 
such as a speech therapy facility, was clearly not 
subject to evaluation by a standardised achievement 
teat. 

Ho coercion was employed to obtain participa- 
tion if a good district-wide testing program would 
have been impaired by insisting that the Stanford 
Achievement Test be given to ALL Title I students. 
There is indeed a serious dilemoa here! How can a 
truly representative evaluation for Title I enroll- 
ees be produced if no central office has the au- 
thority to prescribe the conditions and instruments 
to be used? 

The other '*hom" of the dilemma relates to the 
reasonableness of any statewide evaluation of the 
Title I program by means of standardized tests that 
does not also involve a district by district eval- 
uation, taking due account of the degree of appro- 
priateness of the instrument In terms of the local 
project design. In such total population studies, 
everything tends to be reduced to the level of ne- 
diocrit>; %^reaB the truth is that some distrlccs 
excel and some fail miserably, even in a limited 
area such as reading. 

Let it be abundantly clear that this writer 
has no recourse but to report results on the sam- 
ples tested Fall and Spring and only touch margin- 
ally on the relative goodness of the effort from 
district to district. This is a source of no little 
frustration. District averages for Title I cases 
are not a satisfactory answer. Aside from the mis- 
leading nature of such averages, the nuTid>er of 
Title I cases is so small in many districts that 
one would have to resort to weighting of some sort 
to make comparisons fair. 

Finally, averages are of little practical val- 
ue for co^iarison purposes unless the criteria for 
selection of caaes in terms of educational need are 



ERLC 



6 Z-^- 



N.H. Statewide Testing Program Evaluation - I 



highly stardardised and the type of treattnent is at 
least roughly specified* 

The 1968-69 Testing Program in Kev Hampshire 

Early in 1968 a group of people in the State 
Department of Education, headed by Mr. James Carr» 
State Director of Guidance, working closely with 
the Title I office, Mr. William Sterling, Director, 
planned a statewide testing program, one of the 
main purposes of which was to provide a data base 
from which to evaluate the outcomes of Title I 
projects in accordance with Federal law. While 
the author of this report was consulted sporadi- 
cally, there was no official relationship at the 
beginning of the program, especially in the selec* 
tion of the tests to be used. Subsequently, Test 
Service and Advisement Center was asked to revi&#, 
sumnarize, expand and interpret the test data for 
the *68-*69 program and subsequently to do the 
sane for a repeat program in '69- '70. This report 
is largely concerned with this second evaluation 
and will include the presentatf xi of certain data 
which has not previously been made public. 

The grades involved in both the •68-'69 and 
*69-*70 programs mre 2, 4, 6 and 8. Alternate 
grades were chosen because it was not economically 
feasible to test all grades, although Title I pro- 
grams were going on in all grades. The basic plan 
was to test as comprehensively as possible in 
these four grades and then to evaluate the results 
of the Title I programs In the state in terms of a 
repeat testing program in the Spring, using the 
same test, even the same forms. The plan to use 
the same forms In the Spring was not considered to 
be advisable by this writer, although in retro- 
spect it seems not to have made much difference, 
as will be seen by making soaie ccnparisons of the 
•68-*69 and •69-'70 data. 

The Plan for Testink 

Althou^ this is not the place to go into de- 
tail as to the plan adopted for Implementing these 
programs, it is essential to point out that, for 
the two years involved, the State contracted with 
Har court Brace Jovanovlch (then Harcourt, Brace & 
World, Inc.) to conduct the scoring and the anal- 
ysis of the data through its Programs and Services 
Division. This arrangement was arrived at only 
after Mr. Carr and others involved at the time 
consulted, personally, with both Harcourt and Edu- 
cational Testing Service as tj the best program 
possibilities. The original planning contemplated 
a three-year cycle. 

The tests chosen were as shown In the table 
below : 

Otis-Lennon Stanford 
Mental Ability Test Achievement T<> st 



Grade 2 Elem. I, Form J 

Grade 4 Elem. II, Form J 

Grade 6 Elem. II » Form K 

Grade 8 Inter., Form J 



Prim. I, Form X 
Inter. I, Form X 
Inter. II, Form X 
Advanced, Form X 



All materials in this program were acorable 
by means of the optical scanning equipment avail- 
able at the Measurement Kesea*-..! Center at lova 
City, but the batteries used at Grade 2 consisted 
of score able booklets which were consuuted in the 
process. The booklets for the remaining grades, 
4, 6 and 8, were reusable and were left in the 
schools to be used over again, as needed. 

In Ihe Spring, the Stanford Achievement Bat- 
tery was to be repeated. In every case, the same 
form was used except for Grade 2 in which Form W 
of tne Prlnavy II Battery of Stanford was substi- 
tuted for FormX. Note: This was a change of both 
form and battery. The pattern of using an alter- 
native form for Spring testing in all grades was 
conceded to be the ^propriate pattern but consid- 
eratioos of economics prevailed and the same form 
was used over again in Grades 4, 6 and 8. 

This procedure, as outlined above, was re- 
peated, so far as the tests are concerned, in the 
1969-70 program. 



Evaluation of the 1968-69 Program 

In evaluating the 1968-69 program, it must 
be remembered that a program of this i^agnltude 
bad never bafore been undertaken in New Hampshire 
and althou^ the Division of Programs and Services 
at Harcourt, under the contract, was responsible 
for providing technical and professioanl leader- 
ship, the amount of this leadership was minimal. 
The responsibility for this miniinal involvement, 
however, is a shared responsibility since there 
was no great insistence on the part of the State 
Department personnel that such professional assis- 
tance be provided beyond the barest outline of 
procedures . 

The evaluations which follow lA^st be recog- 
nized to be those of the writer, who was called in 
to provide technical and professional assistance 
in a further analysis of the data beyond the point 
carried out by MRC. The writer^n task was to try 
to critique the 1968-69 program to suggest ways in 
which It ml0it be improved in 1969-70. Every ef- 
fort has been made to be objective and fair to all 
persons concerned, but the net judgment must be 
that the *68-*69 program did not yield all of the 
data needed to carry out the purposes of the pro- 
gram as stated and the actual administration of 
the program left imich to be desired in t^rms of 
quality of test administration, document prepara- 
tion for scoring and adherence to schedule. 

All tests were to be administered throughout 
the state during a more or less uniform period in 
mid-October, but actually the test administration 
was carried on over a longer period of time than 
was planned. There was no central office super- 
vision to Insure that the tests were properly ad- 
ministered, althou^ there is little evidence that 
they were grossly mlsadminls tared at the local level* 



ERIC 



N*H* Ststevide Testing Program Evaluation - I 



The return of the data to MRC for scoring, 
whidi should have been highly uniform, both in 
tens of tine in order to facilitate proceasing, 
and in terms of preparation of essential coding 
infoniatlon» was very poor* Many schools were 
verv late in returning their booklets to MRC. 
Furthermore, preparation of header sheets was 
^oppy, personal ID information was often missing 
and the protocols were not properly arranged to 
facilitate rapid processing and return of reports. 
Consequently, the reports did not cone in for many 
weeks after the tests were administered and actu* 
ally some couummity reports did not get back to 
the schools until months after the teat adminis*- 
t ration. 

The analysis at the Measurement Baaeardi 

Center, Iowa City, as specified to them by Bar* 
court, was fairly con^r^hensive. This consisted 
of the basic MKC service, plus options Which could 
be chosen by the local ccmmunitiea as they wished 
to do. Thus, every coantnity received class ros- 
ters by school and class, giving the results of 
both die 0ti8«-Lennon tests and the Stanford Bat- 
teries involved. For Otis-Lennoa, scores and IQs 
were reported together with local stanines. For 
Stanford, grade scores, grade equivalents, per- 
centile ranks and stanines were reported. In ad- 
dition, for the Stanford test, item analysis data 
were made available for the entire state and sepa- 
rately by community where the communities wished 
to have this information. These statwida item 
analysis data were handed over to the meil>ers of 
the Division most directly concerned. For ex- 
ample, the item analysis data for the state as a 
whole in the field of mcthematics went to the 
consultant in mathematics, the information on 
science went to the consultant in science, etc. 
So far as the writer knows no consistent^ sys- 
tematic use was mada of this information at the 
State level. It is impossible to know, of course, 
to what extent coamunitiaa made use of the data. 

ttorkshops were held around the state on a 
regional basis but these were one-day workshops. 
The morning session was for administrators and 
the afternoon session for teachers. The coverage 
of the test information was superficial and so far 
as the writer could observe afterwards, by making 
some study of this aspect of the program, tha i»* 
pact was minimal. On the other hand, ic must also 
be said that there was no substantial dissatisfac- 
tion indicated, perhape because there vas little 
sense of need for such in-service training • 

Identificatio n of Title I Caaes 

The identification of Title I casen was done 
by means of the allocation of one space oa the 
"Other Data" aection of tha MSC answer vhM or 
ID page of the Primary I Battery to indicate that 
the pupil was in a Title I project. Unfortunately 
the tests were given too early in the year for 
this determination to have bean made in every case 
Many pupils, thouj^t to be qualified for Title I 
help, were so indicated, even thou^ subsequently 

O 

ERLC 



they did nor take part or only stayed in the pro-* 
gram briefly, while many pupils who did subse- * *' 
quently take part in the program were not identi- 
fied. Many more were originally identified as 
Title I than actually were served by the schools. 

Furt^ermure, there was no differentiation as 
to the category of Title I projects involved; the 
child was merely designated as a Title I subject. 
This failure to identify type of projects was 
quickly picked up at the first group meeting, dur- 
ing the suamer after testing, called to discuss 
the results of ^e statewide testing and this con- 
stituted one of the major directions for change 
in the 1969-70 program. 

Outcomes for 196a-69 

The *6B-*69 data returned to the commities 
by MRC provided the meacs for the local coanunity 
to make use of the data not only in connection 
with Title I but with all children who took the 
test. Since the saif^le of schools and school dis- 
tricts participating was good, this meant a very 
substantial majority of children enrolled in these 
particular grades in the state were tested. No 
systematic atte^>t was made to follow up the ex- 
tent to which the data were used and subsequent 
evidence makes it very clear that lack of knowl- 
edgeable supervision in the specific area of test 
utilization at the local level diiftlnished greatly 
the effectiveness of the testing program. School 
districts in Hew Hampshire do not routinely have 
a person or persons on thei^. staff designated as 
expert in the field of testing, application and 
utilization of test results for the inprovement of 
instruction. 

The Involvement of Test Service and A dvisement 
Center 

The writer, as the proprietor and director of 
Test Service and Advisement Center, was asked to 
coordinate the further analysis of the d€»ta over 
and above that done by MRC, working in cooperation 
with the Bureau of Educational Research and Test- 
ing Services of the University of New Haa^shire. 
The intent was to maximize the utilization of the 
data to the extent possible on an "after the fact" 
basis. The effort to do this was a very frustrat- 
ing experience in which the intent and purpose of 
the assignment wss nearly nullified by the laxity 
with which th^' tests were administered. 

In the first place, improper or incomplete ID 
information made it impossible ts compare and col- 
late the information from the mental ability test 
and the achievement test. Contrary to instruc- 
tions, in many cases, full information concerning 
sex and chronological age was not provided for 
every student, nor was the student's name properly 
coded on ^1 answer sheets. A flagrant violation 
of this coding process was the use of nicknaaies in 
place of the full legal name of the child. Thus, 
the early decision to develop an intrinsic code 
based t^on selected letters in the child's name, 




N.ri. Statewide Testing Program Evaluation - I 



plus birthdate, plus sex with due account of wheth- 
er or not the child was a twin was completely frus- 
trated and a very substantial proportion of the 
cases were lost. 

The cooperation of the Bureau of Educational 
Research and Testing Services at the University al- 
so left nuch to be desired, partly because of fail- 
ure to pre-plan and thus to evaluate the amount of 
work that was involved in doing this analysis fov 
so many grades* Programcing was a problem^ since 
extant programs were not sufficient to handle all 
of the analysis specified by the Test Service and 
Advisement Center^ complicated by major changes in 
the equipment a'*Bilable at the Computation Center. 
In spite of all of these difficulties, including 
very poor service from the Measurement Research 
Center in returning the data tape in proper condi- 
tiOTi and within a reasonable time, some significant 
information was derived and provided to local of- 
ficials for action. 

The Spring Progri 

In the Spring of the year (for the most part 
of M^) the Stanford Achlevemtnt Tests were read- 
ministered at the local level to chilf!ren vho pre- 
sumably were in Title I. Local school districts 
were allowed to order as many tests as they wished 
and no attei^pt was made to establish the fact the 
children so tested were indeed in Title I. The a- 
mount of test materials purchaaed under this ar- 
rangement greatly exceeded the number of children 
enrolled in Title I and» consequently, the results 
which came back from MRC were virtually worthless 
as a basis for evaluating Title I outcomes. 

Summary and Conclusions for 1968-69 

From the above » it would appear that the pro- 
gram for *68-*69 was not too successful* Tet it 
is a recognized research principle that a negative 
result is oftentimes as significant as a positive 
one for indicating need for change* From this 
point of view, the *68-*69 program was successful. 
Furthermore » it did provide some very essential 
baseline data in both the areas of intelligence and 
achievement in terms of which some significant e- 
valuation of educational output could be made for 
the State even thou^ not for Title I. Out of this 
experience- came planning for the *69-*70 program 
which promised radical changes in this situation* 

1* Basically » the statewide data reported 
was adequate insofar as it went but provided for 
no systematic comparison of capacity and achieve- 
ment to see to what extent children were working 
up to their optimum level, assuming for the moment 
that the mental ability test was adequate to de- 
termine this level. 

2* The Stanford Achievement Test provided no 
national normative data for Spring that could be 
depended upon, especially in the case of Grade 2 
where two different forms were used» ?orm X in the 
Fall, Foni V in the Spring, as well as tvo levels. 



Grade equivalents, as the author has noted in a 
number of places and under different circumstances, 
are completely tmsatisfacton* for measuring the 
amount of learning taking p-ace over the relatively 
short period of seven months* Grade equivalents 
for Stanford were based upon averag?.s of grade 
groups tested in March of EACH school year. Thus 
annual, not school year, data were available for 
each grade in the national standardization program 
for determining the median scores of children tested 
at Grade 1, Grade 2, Grade 3, Grade 4, etc* These 
grade medians were plotted in terms of raw scores 
transformed to a single base for the grades in- 
volved and a continuous curve was drawn throu^ 
the plotted points. From this, grade scores vere 
derived which were used as a basis for tying to* 
gether forms* These grade scores and the grade 
equivalents > elsewhere given, were identical. 
When they were called grade scores, the decimal 
point was eliminated; with the decimal point in 
hey became grade equivaleuts* 

Increment in score from sidiject to sid>ject always 
varies greatly, depending upon the degree of i^pec- 
ificity of Instruction in the local school* Grade 
equivalents in arithmetic, for example, have an 
entirely different significance than grade equiv- 
alents in reading or vocabulary because the effect 
of the environment is very much greater in the 
latter txio instances than in areas as apecificaJly 
related to school learning as arithmetic. 

Furthermore, an examination of the data not only 
for this State but for many other communities and 
administrative units around the country seemed to 
indicate quite clearly that the Stanford norms 
were not truly representative of national achieve- 
ment* Instead it seemed quite clear that the pop- 
ulation of students tested for norm purposes was 
somewhat aibove average in mental ability'. This 
was evident from an examination of the Stanford 
accessorlaa as well as from the New Hampshire data 
from 0tia«44maon wfaidi showed the state group to 
be sli^tly ebove average in ability but seriously 
below the natiooal norms on several Stanford testa* 
This is not terribly serious if local norms are 
used, as was strongly advised by the writer* These 
were provided by MRC in terms of state stanines 
end local staniaes and percentile ranks* 

An even more serious situation existed in Grade 2 
due to the fact that a different battery level was 
used in the Spring as compared to the Fall, thus 
making almost impossible a comparison of the scores 
to determine the amount of learning taking place 
since the content was different. 

It oicaaionally happena that a test which is ade- 
quate for uae in the Fall is not adequate for use 
in the Spring of the sama school year and, in this 
particular case, the situation was exaggerated ber- 
ceuse the Primary I Battery of Stanford used in 
the Fall waa itself much too easy for many of the 
children taking the teat, resulting in negatively 
skewed distributions, i.e., the piling up of scores 
at the upper or hi^ score end .of the scale* 



ERIC 



N.H. Statewide Testing Progran Evaluation - II 



3. Great delays were encountered in getting 
the tape from MRC ?nd the tape, whep received, was 
not wholly satisfactory. Program difficulties and 
scheduling difficulties at BEKIS rendered the re- 
sultt* less and less useful as the time hetween 
testing time and the return of data to the connu- 
nities from the supplemental analysis was length- 
ened. 

It would be completely senseless to try to place 
the major part of the blame for the failures in- 
volved in the *68-*69 data on any organisation or 
group. It makes much more sense to take a look at 
these data in terms of what they did contribute to 
our knowledge about Haw Hampshire pi^ils and also 
to our awareness of the need for change in the pro- 
gram for the next year to l^rovm the o<itput. Since 
this report ia Umitad as to Icnsth, no further 
malysis or reporting of '68-*69 data will be un- 
dertaken. Instead the concentratioci will be on 
the program for *69-*70 which was iiq>roved in ma> 
Jor ways, although still. In the tnrlter's opinion, 
falling far short of an ideal program for the pur- 
pose intended; naittly, the evaluation of Title I 
outcoetts . 



SECnCM II 



Steps Taken to Improve 
The 1969-70 Statewide Testing Program 
And the Results of These Efforts 



After a nuiber of planning conferences, in- 
volving meii>ers of the Department of Education as 
well as the Title I staff, it waa decided to con- 
tinue with the 1969-70 program in accordance with 
the ciginal plan for a three-yeer cycle. Mr. 
James Carr, State Director of Cuidanca, was in- 
volved in these discussions although ha siibsa*^ 
quently was away on leave for graduate study and 
was not involved in the 1969-70 program. Thia 
left a void which was filled by involving the 
Title I staff far more intimately with the tea ting 
program than had been true before. 

As a result of the initial conferences, prior 
to any commitment to Harcourt, Brace & World for 
MKC service, an agreement was reached to atrengthen 
the program In e number of ways. 

1. Mine Title I project categoriea plus a 
tenth general category were developed* Ihia waa 
tried first for the 1968-69 program but waa not 
affective beceuse of inadequate information. These 
categoriea were the reault of careful atudy of the 
project applications which had been approved dur- 
ing the previous year by the Title I ataff . 

2. To implement the collection of thia infor- 
mation in 1969-70, the writer and the steff of the 
Title X office deiveloped two IBM cerd queation- 
nairea; the firat of which was labelled the **IH" 
card or regie traCion card, and tlie other the "OUT** 

er|c 1 



card or termination card. A photocopy of these 
two cards is attached. TTieae instruments provided 
the medium for collecting information concerning 
the diatribution of children according to the type 
of Title I project in which they were involved. 
It was known, for example, that a very large num- 
ber of Title I students were involved in correc- 
tive reading programs of one kind or another, but 
these data had not been quantified, nor was there 
any specific information concerning the number of 
cases involved in programs other than reading. 

Furthermore, it was considered very desirable to 
determine to what extent Title I children remained 
in the project to which they were assigned for the 
full year or stayed only for some period less than 
a school year. All of this information, plus other 
information as may be seen by examining the cards 
themselves, became evailabls through the medium of 
aie and •*0W cards. 

These cards were distributed by the Title I office 
with instructions to the local conmunlties to com- 
plete an "m" card for each child taken into the 
program at whatever time this occurred and at the 
termination of his involvement in the project to 
complete an "OUT** card. 

A candid evaluation of the functioning of these 
carda indicates thet they have contributed enor- 
moualy to the body of information about Title I 
children. They reveal some discrepancies about 
the nuii)er of cases in Title I enrollad versus 
the number of tested cases. Fuller discussion of 
the results of the "IN" and "OUT" card date anal- 
ysis cosMS at a later point in the report. It is 
im^rtant, however, to point out here that the com- 
pletion of the "Til" and "OUT" carda has not quite 
reeched the goal intended at the time they were de- 
vlaad of providing, in addition to data for stetia- 
Ucal snalyaia, an immediate sight file in the 
Title I office of ell enrollees very soon after 
the achool yeer sterts. Problew have been encoun- 
tered in diatributing and collecting the cards and 
it ia felt thet there is room for ii^>rovement in 
thia eree. 

3. Since the st^rvisory unions did not heve 
my trained peraotmel to handle the teating pro- 
gram, it was felt important to establish a chain 
of command irlthin eech auperviaory union so as to 
delegate reaponaibility on a pyramidal basis, with 
one person et the top being cleerly responsible for 
the functioning of the program et the local level. 
Therefore, each auperviaory union waa requested to 
select an individual, presumably someone that had 
aoma previoua training in measurement or some in- 
clination to work with such date, to ect as the 
supervisory union coordinator. Within the st^er- 
viaory union reaponaibility waa delegated down the 
chain of cuimd to the principals of the buildings 
and to the teachera for aeeing to it that the tests 
wer# adminiatered on schedule, that the protocols 
were properly decned up, and thet ID information 
waa complete et the time of ahipment to MRC for 
acoring. Thia waa to include auch things as accur- 



N.H. Statewide Testing Program Evaluation - II 



i 



•liTM MTI 

Ymt Mm* Coy 

SIX{ MY. . ... . . OMt IWINj tn NO 

OlAOb K 1 2 9 4 5 « y Om^mS-IS NmOtwM ijCkthm*) 

KHOd , TOWM 

OASHOOM TIACMn 



UATI . 



HAVm Of fiOJKTi 1t94««ftf»(SM fco^ rtl w m drcb mm) 
IS IMM SnmNT WVOiVIO M ANOmM Tmi 1 MOOMMt YU.... NO.... 
NUMM or NOUH m Hm flUDMT » MVOlVD M THU 1 PtOORAM: 

1 S94««7«f MIIIm KMmmm) 
tNtnuCTOft: CX mm) 

K Un^tm « I Hi I ■ ■! TocW — » (1) 

t. 0>nM»rii I HiiIpIwImAm CO 

C Owi iM « fmmm m Ag wcy . . . .g) 

O. S^mW T«MfcOT: 1— ftgi lM«*f (H S^Mdi (4). 



WAS STUDMT IN A Ttni 1 nOO«AM lASr YIAKt 

Y««.......(i} N».......(S) 0M*f Kiww m 



Mm Mn 



Ymt 



MXitOY ML IWMiYM NO 

•tAOfc K 1 t * 4 f • y MbtS-lt NmOmM 



Mil. 



Ymt 
BAfl. 



Yw Mm* Pwt 
MAMM or MffCTi 1t94f4ytf«0M iMlmcttiM^ clr4t mm) 
VAION NX THWWIOWi OC M*) 
A* tMliffMltiY f'^'V'" * Mc4 iMt. . . < . 
i. MSAmI m C MafMtr^YMT (1/ 

0. ft>grf m ■iiiii g niI m Hi — ^ m t. o*« (Q 

noom or OMo m fMt rtootA<^ (la ruptmi «• »itMH> t NpfowwO 

A. Iw ili rt pMimi * rrJtraa •! giMl 0) 

C MInr dMaf* — pntiraM M »M MM kttb M.^ W 

0. N» iMl kMit (O Wlr, MIT 



ERIC ' 11 



N.H. Statewide Testing Program Evaluation - II 



ate completion of the header sheets, arrangement 
of the answer sheets in alphabetical order within 
claBs, the removal of unscoreable sheets from each 
class especially those wnere a great many double 
marks appeared, where the marks were faint and 
probably not scoreable, or where very large numbers 
of items had been omitted so that it was obvious 
that the test was clearly invalid. 

Especially great en^hasis was put on the need to 
have accurate ID information for each child and an 
attempt was made to have the child's name coded in 
a standard fashion in order to meke use of a self- 
generating code for each pupil as described ear- 
lier. A preliminary experiment was carried out 
with data from the 10th grade statewide program 
administered and scored by the UNH Bureau of Edu- 
cational Research and Tasting Services, in which 
Digitek answer sheets (far better designed for the 
purpose) had been used. The results of this anal- 
ysis showed that only a tiny fraction of the total 
failed to complete the ID information accurately, 
amounting to some 20 to 25 cases out of more than 
10,000 tested. This encouraged us to believe that 
we might get similar rasulta at the lower grades. 

However, the very inadequate design of the Stanford 
answer sheets and the fact that the Stanford sheets 
provided no space for coding birthdate, left us 
completely dependent for this information on the 
Otls-Lennon answer sheet. These design inadequa- 
cies interacting with carelessness on the part of 
the teachers in supervising the coding of this in- 
formation by pupils, resulted in data that proved 
not to be very useful in establishiug such a pi^il 
code. Thus, the matching of fall and sprang data 
for the Title I children tested was again frustra- 
ted, to say :.othing of a randoa sample of children 
across the state who were also tested. These fail- 
ures complicated the analysis to the point where 
this constituted a major stumbling block in all 
subsequent studies, seriously eroding the basic 
validity of the data herein reported. 

In an early meeting of the testing supervisors 
from the supervisory unions, an attempt was made 
to get the schools to agree to make use of Social 
Security nuH>ers for coding purposes, but this 
proved to be totally unsuccessful, although soott 
supervisory unions did indicate a serious interest 
in this possibility for the future. 

The matter of identifying pupils by code number 
within the State of New Hanpshire for testing pur- 
poses still remains an unsolved problem to which 
the Srate Department cf Education must, at sow 
time, frankly face up to if it ever is to be pos- 
sible to collect data concerning individual stu- 
dents on a cumulative basis for follov-up purposes. 
This, however, is not a ajor consideration of this 
report, since in this report wm are conccmad al- 
most solely with the determination of the effec- 
tiveness of the TltJe I program in 1969-70. 

4. An attempt vas made to set up specific 
dutes within which tha tasting program would be 

erJc 



completed, the protocols sent from the schools to 
the supervisory union offices, and finally shipped 
from the supervisory union offices to MRC. It was 
our intent to keep accurate account of the dates 
of the receipt of the material in the supervisory 
union offices and the arrival dates of this mater- 
ial at MRC but this did not work out In practice. 
Many communities were late in returning their data 
to MRC and some failed to return them according to 
the specified method, namely parcel post, special 
fourth class mail, special handling. Consequently, 
the date originally set up for processing our data 
at MRC was missed, making it necessary for them to 
work us Into their schedule when they could, after 
receiving word from the State Department of Edu- 
cation to begin scoring. 

Quality of the MRC Service 

During the 1968-69 cesting program, the qual* 
ity of evaluative tnaterial returned by MRC, as 
specified by Programs and Services of Harcourt, 
Brace & World, was quite adequate, with one or two 
minor exceptions. 

The contract originally negotiated indicated 
that the same service would be maintained for the 
same price over a period of three years. However, 
sfter conversations during the summer subsequent 
to the 1968-69 program, there arose a serious mis- 
understanding between Harcourt, Brace & World, 
Inc., and the State Department of Education - es- 
pecially the Title I office - with the result that 
the information returned for the 1969-70 program 
was substantially lacking in the degree of com- 
pleteness that characterized the 1968-69 program. 

Moreover, the service from MRC certainly was 
not improve although who was at fault in this re- 
spect it is hard to s^^y. To some extent, at least, 
it was the failure on the part of the State Depart- 
ment in making it clear that the same service was 
expected and the subsequent failure of the repre- 
sentatives of Harcourt, Brace & World to write 
specifications to insure this same degree of com- 
pleteness. However, there was no consequent change 
in price for this less adequate service! 

Because of the difficulty in getting the IBM 
cards shipped intact from MRC in 1969, it was de- 
cided to go to magnetic tape. However, the mag- 
netic tiq^e was not shipped from MRC until all other 
aspects of the contract had been completed. The 
delay, amounting to some months, in getting the 
tape to New Hampshire tremendously retarded the 
analysis of the data by the Bureau of Educational 
Research and Testing Services of the University of 
Msw Hampshire. There were programming proolems 
Involved also because of incoa|)lete specifications 
from MRC and errors in the tape. Although the a* 
nalysis requested from CERTS was essentially sim- 
ple and similar programs had been carried out rou- 
tinely in many places around the country before, 
mavarthaisas dalayn of substantial laat^^ were in- 
cmmd. Fartharw)re, because of the experimental 
naturt: of the program, new ways of looking at the 



N.H. Statewide Testing Program Evaluation - IIIA 



data kept occurring to the vriter and to the aea- 
bers of the staff of Title I, necessitating ex- 
panding the analysis beyond the points originally 
contemplate. The problems of lost end poorly 
shipped mater1.al8 continued to plague the program 
In spite of repeated complaints to Harcourt and 
MRC. 

In summary, one must say that the administra- 
tion of the program still shoved m^ay Inadaqusclea. 
Perhaps this la Inevitable In any exparimental pro- 
gram carried out by people » many of whom are in- 
experienced at this sort of thing. Kovever, the 
data analysis which was carried out greatly exceeds 
the typical analysis of such data, including as it 
did certain novel featurea such as the coordination 
of the "IN" *»d "OUT" cards with the teat informa- 
tion, the testing of a representstive state earn- 
pie to provide a base for comparison of gains for 
the Title I children over seven months » and other 
things that will become evident as this report is 
completed. Care has been taken to point out theae 
positive aspects of the program as well as the neg- 
ative ones and the improvements t over 1968-^9 have 
been noteworthy. 

IXiring the aummer of 1973, it was decided 
that all responsibility for statewide teating af- 
ter that time would be transferred from Title I to 
the Mrector of Rsseardi and Testing of the State 
Department of Education. This decision was fully 
carried out and Title I did not participate at all 
In the collection of statewide testing data in 
1970-71. 

TWo years of testing with Stanford, Form X, 
provided as much data-baae information as was 
needed or desirable so fsr as Title I ia concerned. 
The continuation of the use of Form X of Stanford 
In the upper grades in the fall of 1970« using 
over (mce again the aame booklets that had been 
stored in the schools for the three-year period, 
was protested but economics sgain pre veiled and 
the booklets were reused in sp^te of clesr evi- 
dence of coaching in the 1969-70 program. Th» 
program was carried out through the auspices oL 
the Buresu of Edur.stlonal Ktsesrch and Testing 
Services of UMH and the results are generally un-- 
known to this writer. There was every reason to 
believe that thla coaching would be accentuated 
throu^ use of the same form a third year. Thus, 
Tit lot I does not have the types of data reported 
In this document for the *70-*71 school year, ex- 
cept that "IN" and "OUT" cards are available which 
will reveal the extent to which the type of pro- 
jects carried out are slmilsr both in kind ^d 
proportion to what they were In -69- '70. 



SECTION III 
Part A 

Chronological Age Distribution In New Hampshire 
Compered with fhe National Group 



/ ^ may not seem significant at first 
glam o_cributlon of chronological ages 

within a joflned group Is sctually very Important 
In t^ie Interpretation of test dsta. Only the In- 
experienced or careless snalyst forgets this/ An 
older group (or child) will generslly do better 
than a younger one even If the difference In age 
Is only two or three months. For example, s child 
entering school sbout ss late ss he can enter and 

. atlll be within the age-ln-grsde group. I.e., 
within the range of twelve months specified by 
local law or regulation as "normal**, has a def- 
inite advantage over the youngeat child sdmltted 
that year. This Is just an offshoot of the basic 
fsct thst cognitive sbllitles ss assured by 

^ mental sblllcy or Intelligence teb«.8 do contribute 
s great deal to the In-achool performance of 
children, either separately or when considered as 
s group. Contrary trends sre found In this sres. 
Many communities sre making dsy care and nursery 
school attendance an authorised and official part 
of public school Instruction. Besd Stsrt Is em- 
phasizing structured pre-)ichool experience for 
the disadvantaged. Other Instsnces could be 
quoted. 

!7piiard modification of the lawful entrance 
I ^e to kindergarten or flrat grade Is also being 
sdvocated snd la being Implemented In New Hampshire* 
This will have serious repercussions years later 
yibmn the group as s whole , (if the practice be- 
comes accepted generally in the atate) becomea 
older than the national norm group, especially In 
light of the slightly shove normal range of brlght- 
nesa now ahown to be characteristic of our state 
population. 

In this writer's opinion, there Is no virtue 
of added sge ss a prerequisite to school entrsnce 
unless the home environment is sddlng something 
besides typical days-of-llfe experiences to the 
child's "resdlness" for school which seems unlikely. 

In Tsblea IIIA-1 and 2, we find the distribu- 
tion of ages for boys snd girls sepsrately snd for 
the total state groups In Grsdes 4 and 6 tested In 
the Fall of 1969« (The totel grou^ Includes all 
those who did not code sex on the Otls-Lennon 
snswer sheet from which these dsts csme.) For no 
sensible reaaon that this suthor csn discover, 
the coding of sex was omitted on many of the Otls- 
Lennon snswer sheets* 



We find thst the New Hampahlre totsl group 
sppears to be a month younger than the group on 
which the Analyaia of Uamlng Potential was 




N.H. Statewide Testing Program Evaluation - IIIA 



Table IIIA - 1 

New Hampshire Statewide Testing Program - Fall 1969 
Distribution of Chronological Ages 
Separately by Sex and for the Total Group 

Grade 4 



By Kay Of 
Comparison 



Statewide 



Age In 






Total Group 


Years & Months 


Boys 


Girls 


I* 


IX** 


14- 4 to 


14- 9 


1 


0 


1 


1 


13-10 to 


14- 3 


0 


1 


1 


1 


13- 4 to 


13- 9 


3 


1 


4 


5 


12-10 to 


13- 3 


2 


5 


7 


7 


12-4 to 


12- 9 


4 


5 


9 


11 


11-10 to 


12- 3 


16 


13 


29 


36 


11- 4 to 


11- 9 


89 


38 


127 


142 


10-10 to 


11- 3 


183 


88 


271 


288 


10- 4 to 


10- 9 


575 


292 


867 


936 




10- 3 


132 


81 


213 


224 




10- 2 


124 


82 


206 


225 




10- 1 


149 


91 


240 


256 




10- 0 


181 


86 


267 


285 




9-11 


162 


115 


277 


295 




9-10 


231 


155 386 404 




9- 9 


342 


356 


698 


729 






9- 8 


349 


388 


737 


780 






9- 7 


394 


390 


784 


839 






9- 6 


350 


361 


711 


747 






9- 5 


362 


393 


755 


786 






9- 4 


353 


392 


745 


783 






9- 3 


336 


408 


744 


772 






9- 2 


354 


400 


754 


802 






9- 1 


330 


427 


757 


802 






9- 0 


307 


370 


686 


716 






8-11 


264 


307 


571 


599 






8-10 


191 


246 


437 


469 




8- 4 to 


8- 9 


10 


26 


36 


37 




7-10 to 


8- 3 


20 


24 


44 


46 


7- 2 to 


7- 9 


1 


0 


1 


1 



Nationwide 
ALP Norm Group 
Fall 1967 

2 
1 

12 

12 

34 

70 
162 
346 
1,032 
215 
246 
303 
456^ 
558 
635 
830 
795 
890 
820 
871 
763 
835 
785 
756 
611, 
441 
317 
131 
47 

0 



Total N-Grade 4 


5.815 


5.550 


N Mld-12 Months 


3.972 


4.447 


N Mid-18 Months 


4,911 


5,057 


% Mld-12 Months 


68.30 


80.12 


X Mld-18 Months 


84.45 


91.12 


Median Age 


9-6 


9-5 



?.1,365 12.024 12.976 

8,379 8.824 9.149 

9,968 10,513 11.127 

73.73 73.39 70.51 

87.71 87.43 85.75 

9-7 9-6 9-7 



^Students Who Coded Sex 
**Incl\ides Students Who 
Did Not Code Sex 



1'* 



N.H. Statewide Testing Program Evaluation - IIIA 



Table IIU - 2 

New Hampshire Statewide Testing Program - Fall 1969 
Distribution of Chronological Ages 
Separately by Sex and for the Total Group 

Grade 6 



By Way of 
Comparison 



Age in 
Years & Months Boys 



Girls 



Sta tewide 
Total Group 
I* II** 



Nationwide 
ALP Norm Group 
Fall 1967 



17- 4 to 


17- 9 


1 


0 


1 


1 


u 


16-10 to 


17-3 


0 


0 


0 


0 




16- 4 to 


16- 9 


0 


0 


0 


0 


*> 


15-10 to 


16- 3 


0 


0 


0 


2 


5 


15- 4 to 


15- 9 


1 


0 


1 


1 


1 9 


14-10 to 


15- 3 


3 


4 


7 


8 




14- 4 to 


14- 9 


7 


5 


12 


13 


DZ 


13-10 to 


14- 3 


19 


11 


30 


32 


70 


13- 4 to 


13- 9 


116 


46 


idZ 




99Q 


12-10 to 


13- 3 


193 


109 


302 


327 


i 


12- 4 to 


12- 9 


552 


301 


853 


908 


QCQ 
9Do 




12- 3 


127 


59 


186 


200 


226 




12- 2 


135 


72 


20/ 


£,21 


91 A 
Z ID 




12- 1 


119 


82 


201 


214 


309 




12- 0 


159 


nc 

95 


OCA 


o£ 1 


An ft 




11-11 


13o 


or\ 
OU 


2 lo 


Z JU 






11-10 


1 on 

199 


IDj 




j9i 


A1 9 




11- 9 




OCT 


/U*f 


151 


OU J 




11- 8 


I/O 




D9** 


1 21 


7A1 
/Di 




11- 7 


376 


402 


778 


816 


809 




11- 6 


360 


348 


708 


744 


736 




:il- 5 


385 


378 


763 


800 


735 




11- 4 


336 


378 


714 


753 


751 




11- 3 


308 


386 


694 


726 


855 




11- 2 


331 


376 


707 


729 


874 




11- 1 


316 


411 


727 


766 


754 




11- 0 


324 


374 


698 


726 


562 




10-11 


248 


298 


546 


569 


462 




10-10 


177 


269 


446 


463 


327 


10- 4 to 


10- 9 


27 


42 


69 


70 


160 


9-10 to 


10- 3 


46 


44 


90 


93 


73 


9- 3 to 


9- 9 


1 


0 


1 


\ 1 


0 


Total N -Grade 6 


5.699 


5,438 


11,137 


11,703 


12.756 


N Mid -12 Months 


3.878 


4.323 


8,179 


8,556 


8,735 


N Mid-18 Months 


4.733 


4.876 


9,609 


10,073 


10,683 


% Mid-12 Months 


68.05 


79.50 


73.44 


73.11 


68.48 


X Mid -18 Months 


83.05 


89.67 


86.28 


86.07 


83.75 


Median Age 


11-6 


11-5 


11-6 


11-6 


11-7 



*Students Who Coded Sex 
**Includes Students Who 
Did Not Code Sex 



ERLC 



-Il- 
ls 



N.H. Statewide Testing Program Evaluation - IIIA 

standardized.!/ The median age of the New 
Hampshire total group in Grade 4 in the 1969-70 
program, with an "N".of 12,024, is 9 years and 6 
months \ahile the median for the ALP sample is 9 
year* 7. Both sets of data ostensibly were ob- 
tained in October of the school year although the 
spread of the testing dates in either sample is a 
factor of an unknown importance in this comparison* 
(This difference may be illusory because of 
slightly different times of year for the collec- 
tion of the data.) Fc^ all practical purposes, 
we can say that the New Hampshire group now is 
fairly typical of the national sample as regards 
distribution of chronological ages at Grade 4, 
(National median, 9 years 7 months; New Hampshire, 
9-6+.) The data for Grade 6 are consistent in 
this regard. The median age in Grade 6 is 11 
years and 7 months in the Analysis of Learning 
Potential national sample and about one month lesa 
in the New Hampshire population. 

It would hardly be fitting to leave this 
topic without commenting on the wide range of ages 
to be found within either Grade 4 or 6. Consid- 
ering the statewide group in Grade 4, the effec- 
tive ages ran^e from 7 years 10 months (first age 
level for which there are a noticeable number of 
cases) to 12 years and 3 months (same limitation.) 
Thus the data show a real spread of more than four 
ye<irs^ In Grade 6, the effective spread, 9 years 
10 months to 14 years and 3 months » is> again» 
ovt^r four years. Thus retardation i^ clearly an 
accepted policy here, as it is generally, and its 
effects are cumulative from grade to grade. 
These over-age duller children have, nevertheless » 
increased in learning potential (mental power* 
not brightness) and thus are made more nearly 
equal by their retardation to their grade popula - 
tion peers in ability to handle the work of the 
grade. This is why the IQ is inappropriate as a 
basis for comparing capacity and achievement 
except for children within the age controlled 
range Young-bright and older-dull approach each 
other in ability to do school work. 

To turn now to the age distributions sepa - 
ra tely by sex in Grade 4, it seems that girls in 
New Hampshire are a little over one month younger 
than boys. The same difference is found in Grade 
6. This cne month difference takes on more sig- 
nificance when we examine the results of the 
learning ability test (Otis-Lennon) discussed 
next in which it is evident that this measure of 
cognir.ive learning potential also favors the girls 
at both grade levels. 



y The Analyais of Learning Potential was standard 
ixed in October 1967 and constituted the latest 
large and scientifically representative group 
available for comparison. The Metropolitan 
national sample of Fall 1%9 confinis the ALP 
data but age data MVe not available in final form 
as shown here at the time of this report* 

O 

ERLC 



Perhaps the author may be excused if he does 
some further analysis of this largely overlooked 
problem* Many people see the increase in entrance- 
to-school age as primarily a logistics problem. 
In other words, it is reasoned that if the child 
is kept out of school until he is "ready" for 
kindergarten or first grade, he will be more 
likely to move through the grades at a steady 
pace, one year of school for one year of chrono- 
logical age. Since this would substantially cut 
down on repeaters and thus get more children 
through school in the normal span of twelve years 
available for public education it would eventu- 
ally, as the reasoning goes, save money for the 
taxpayer. 

However* there is another way of looking at 
this problem, namely, the substantially large 
number of children who are above normal in their 
cognitive abilities who are unfairly denied the 
opportunity to enter school when they are capable 
of benefiting by instruction and even to move 
through school at an accelerated pace» if they 
are able. Such acceleration should not come about 
by double promotion but by widespread acceptance 
of individualized progress. 

More and more the educational community is 
now seeing the school experience as necessarily 
being adjusted to the needs and capacities of the 
individual student. The idea that there is a 
fixed curriculum for a particular grade subject 
by subject, through which the individual proceeds 
in a kind of lock-step fashion, is totally falla** 
cious* There is no generally accepted hierarchy 
of gradedness in the curriculu^a in any school sub- 
ject, not even in arithmetic which, by the nature 
of the subject, might most closely approximate it* 

There is a notion that there is a development 
age at which a child is "ready" for school and 
before which he is not* Back of this is the idea 
that some children are not ready to move out of 
the home environment into the broader group ex- 
perience of the public schools at the normal time 
in their chronological age dimension. If there 
were a fixed curriculum, it would be true that 
some children would never reach the state where 
they could comfortably move out of the home circle 
into the larger school world on an equal footing. 
These are the educable and trainable children 
whose lot is certainly not an enviable one in to- 
day's schools. 

Probably the truth of tlie matter is that the 
schools and the teachers in the schools are not 
ready rather than the children are not* It all 
depends on the essential goals of public education. 

One could make a strong case for the idea 
that in a democracy, where education is publicly 
supported, it should be illegal to require that a 
child, not obvious!.y a danger to himself or other 
children, be kept out of school when he has 
reached the mandatory school entering age* 
Parents might be advised to keep a child out of 



N.H. Statewide Testing Program Evaluation - IIIA 



school because, in the opinion of the school per- 
sonnel, the child was not *'ready** for the type of 
school experience a given school or system is 
ready to giV2« Along with such a recotnroendation 
should go a statement of philosophy stating the 
school's objectives and ludicating that there is 
no intent or desire to individualize instruction* 
In other words, the curriculum for each grade is 
clearly set forth and the child must accomplish 
this curriculum in lock-step fashion if he is to 
be allowed to enter the group. 

It is devoutly hoped that such a philosophy 
shall rapidly give way to a concept of education 
as "timely incremental learning" at a rate esch 
child sets for himself and not at the rate the 
system dictates. At the present time, few schools 
accomplish this degree of fr3edom, although more 
and more are moving in this direction* 

If one were to look at the problem solely 
from a cost and logistical point of view and be 
entirely consistent in doing so, the logical way 
to approach the problem would be to say that the 
public purse would support 12 years of school ex- 
perience (or 10 or 14 as the case may be) after 
which the parents would have to assume responsi- 
bility for financing the child's education REGARD- 
lESS 0? HIS STATUS AT THE END OF THE TWELVE YEARS. 
This is now normally done after a student gradu- 
ates from a senior high school, althoigh there is 
growing awareness of the need for formal educa- 
tion through at least two more years. 

The Age Controlled Sample and the Modal Age 

Many years ago. Dr. Truman L. Kelley, then of 
Harvard University, developed the concept of the 
modal age. Dr. Kelley was interested in the prob- 
lem of standardizing tests and wished somehow to 
reconcile the inconsistencies arising from the 
construction of age -oriented norms as compared to 
grade -oriented norms. To accomplish this, he cre- 
ated the idea of having children who wi.re at age 
for grade used as the standard norm group. This 
would be a range of twelve months if all children 
whose birthdays fell within a calendar year would 
be allowed or required to enter school. This so- 
called modal age was to be determined by rather 
sophisticated statistical means but experimental 
evidence later showed that it could be more easily 
and almost as exactly determined simply by finding 
the range of twelve months of age containing the 
largest number of cases for any similar range in a 
given distribution. 

Later, after this concept had been applied in 
the norming of the Stanford Achievemr.nt Test (1940 
Edition), it was discovered that the range of 
twelve months was not adequate to take care of 
variations in entrance age and variations in pro- 
motional policies from one place to another around 
the country. Moreover the ^odal age population 
proved to be above average in measured intelli- 
gence* Subsequently, in the Metropolitan Achieve- 
ment Test series the modal age idea was modified 

ERiC 



to include a range of eighteen nionths, thus allow- 
ing for variation in entrance age and also for 
differences in promotional policy. T.»e 1958 edi- 
tion of Metropolitan Achievement Test series was 
standardized on such an age controlled sample and 
the 1970 edition also will provide normative in- 
formation for the age controlled sample. 

This age controlled sample is an important 
concept because it provides an appropriate norm 
for the average child who has been allowed to move 
through school at the usual pace, i.e. one grade 
for'each year of chronological age. For all chil- 
dren falling within the age controlled sample for 
their grade, one could say that the likelihood is 
great that hia exposure to instruction had been 
more or less standard for a child of his age. For 
the older child who had been held back one or more 
years and thus is outside the age control group, 
two interpretations of score are necessary for 
complete understanding » one based upon his grade 
status regardless of his age and the other upon 
his age status regardless of his grade* This also 
should be done, of course, for the younger chil- 
dren who fell below the age controlled sample* 
These are the children who have been allowed to 
enter school at a younger than normal age or who 
have moved ahead of the group because of a more 
than average learning rate. 

The age controlled sample provides a norm 
sample that remains comparable in both range of 
age and total in*-8chool experience from grade to 
grade for 80% to 90X of children in school. 

The age controlled sample in the distribu- 
tions of chronological ages for the State of New 
Hampshire Grades 4 and 6 have been recorded, using 
a one month step interval for the requisite 18- 
montha age range. Note that this range is compar- 
able to the age controlled sample for. the country 
as a whole. About 87X of children in New Hamp- 
shire in Grade 4, for example, fall within the age 
controlled sample as compared to 86% in the nation- 
al norm sample for the Analysis of Learning Poten- 
tial. In Grade 6, the comparable percentages are 
86X for New Hampshire and 84% according to the ALP 
data for the nation as a whole. 

It would appear therefore, in conclusion, 
that in thia particular way of looking at the age 
data Mew Hampahire also is typical. The fact of 
thia typical character of the age composition of 
the New Hampshire population should be kept in 
mind later when this report deals %d.th the acmeve- 
nent of Mew Hampshire students as compared witli 
the national norm. 



N.H» Statewide Testing Program Evaluation - IIIB 



SECTION III 
Part 6 

Evaluating the Measured Mental Ability 
New Hampshire Students against National Norms 



The Otis-Lennon Mental Ability Test was used 
in this program in both 1968 and 1969 with nearly 
identical results. There is great misunderstand- 
ing about the function of such tests as measures 
of cognitive learning ability. This ability is 
perhaps the most important dimension of school 
learning potential. Psychologists no longer con- 
sider it as a measure of native or inherited abil- 
ity; it is a measure of the ability of an indi- 
vidual to respond to situations demanding the ex- 
ercise of cognitive abilities, i.e. the ability 
to communicate, know, or understand and reason 
about people, places, or things.!/ 

In Table IIIB-1 we show the distributions of 
DIQs separately for boys and girls and for the to- . 
tal tested group in Grade 4 and in Grade 6. Please 
note that the sum of the cases tabulated separately 
for boys and girls does not always equal the total 
number of cases because a rather substantial num- 
ber of pupils failed to code sex on their answer 
sheet and, therefore, could not be included in the 
distributions done separately by sex. 

Table IIIB-1 accounts for roughly 12,000 
children in each grade, a very large proportion 
of those in school in these grades.' In Grade 4 
the median DIQ is 101 and the mean 100.41, which 
is in itself one indication of the normality of 
this distribution. The standard deviation is 
14.75. (In an unselected age population, includ- 
ing all children of a given age regardless of 
grade, the standard deviation would be 16.0 by 
definition.) One can see that the range of DIQs 
Is from about 50 to 150, thus covering the total 
range of the teat. The distribution of DIQs for 
Grade 6 also is gfvea in Ta'ble IIIB-1. These data 
show that the situation does not differ very much 
from that in Grade 4. The median (50th percentile) 
DIQ is 102 and the mean (rounded off) is the same. 
The standard deviation is 15.12 and the range is 
from DIQ 50 to 150. 

Similar distributions are available for Grade 
2 and Grade 8. In Grade 2 the median DIQ is 103 
and the mean also is 103 with a standard deviation 
of 14. In Grade 8 these values are 103 for the 
median, 104 for the mean with a standard deviation 
of 14. The consistency in these results over the 
£uur grades is notable. Distributions for Grades 
2 and 8 are not shown here because the major part 
of this report is limited to Grades 4 and 6 for 
economy's sake. 



1/ See "Intelligence", Encyclopedia Americana . 
1971 Vol. 15 Pgs. 241-245 



In these DIQ tables, we have added one addi- 
tional feature, namely, cumulative percentages, 
which allows the reader to determine tho percent- 
age of children having DIQs below any desired point 
corresponding to the uppermost value in the step 
interval. For example, consider the step interval 
111-113; 84% of the boys in Grade 4 had DIQs of 
113 or lower. By contrast, in the interval 114- 
116, 85% of the girls in Grade 4 had DIQs of 116 
or lower, clearly showing a differentiation between 
boys and girls in favor of girls. 

The statistics given in this table are summar- 
ized at the end of the table where percentiles 
(scores) are given corresponding to selected per- 
centile ranks together with the mean and the stand- 
ard deviation of each distribution. 

Means and medians generally agree in near- 
normal distributions and it will be seen there is 
such agreement in this instance. Skewness in a 
distribution will be reflected in a difference be- 
tween these two averages, the mean always being 
toward the "tail" of the distribution from the med- 
ian. (Skewness in lay terms might be described as 
lack of symmetry or lopsidedness.) If a test is 
too easy, for example, there is a tendency for 
cases to pile up at the top end of the score scale 
while the opposite is true if the test is too hard. 
This difficulty was encountered in certain of the 
distributions of subtest raw scores on Stanford. 

The New Hampshire populations in Grades 4 
and 6 as indicated by these distributions are so 
typical of the national scene in both age and 
learning ability that they might well have been 
used, with only minor loss in precision, to pro- 
vide national norms for the Otis-Lennon Test! Even 
the standard deviations, which by definition for 
the national population are 16 for any single age 
group, closely approximate 15 in our £rad£ dis- 
tribution where a slight curtailment is usually 
found. 

A realistic appraisal of what the demonstra- 
ted variability of these brightness measures mean 
for the New Hampshire educational program is per- 
haps the •^strongest argument for individualizing 
instruction, recognizing that inany children must 
necessarily proceed through the established cur- 
riculum at a somewhat different pace than the 
average or typical child depending upon their 
ability to cope with the school learning situation 

It is for this Very purpose that much Title I 
money is spent, namely, to help individualize in- 
struction for needful children, especially those 
considered disadvantaged and/or those with known 
and definable weaknesses in learning. It is hoped 
of course that such special help will restore 
these pupils to their rightful place in the dis- 
tribution of scores in the various traits meas- 
ured. What actually happened among tested Title 
I children will be discussed later. However, 
there are few instances where low DIQ children 
ever have become average or high on these meas- 
ured traits except where it has been possible to 
show that the original test was inappropriately 



-14- 

18 



N.K. Statewide Testing Program Evaluation - lllB 



Table IIIB-1 



New Hampshire Statewide Testing Program 
Distribution of Otis-Lennon Deviation IQs 
Separately for Boys, Girls, and Total Group 
Tested Fall 1969 



DIQ 
Interval 

150 
147-149 
144-146 
141-143 
138-140 
135-137 
132-134 
129-131 
126-128 
123-125 
120-122 
117-119 
114-116 
111-113 
108-110 
105-107 
102-104 

99-101 

96-98 

93-95 

90-92 

87-89 

84-86 

81-83 

78-80 

75-77 

72-74 

69-71 

66-68 

63-65 

60-62 

57-59 

54-56 

51-53 

^8-50 

Totals 

Q3 Zile 75 
Q2 Zile 50 
Ql %ile 25 



BOYS 
Cum* 
No. Zage 



5 
2 
2 

10 
19 
19 
38 
54 
93 
97 
120 
194 
250 
357 
325 
416 
526 
420 
419 
465 
366 
455 
321 
206 
177 
135 
92 
65 
48 
33 
12 
15 
7 
8 

5 

5,776 

108 
98 
88 



99 
99 
99 
99 
99 
99 
99 
98 
97 
96 
94 
92 
89 
84 
78 
73 
65 
56 
49 
42 
34 
27 
19 
14 
10 
7 
5 
3 
2 
1 
1 
1 
1 
1 
1 



G R A 


D £ 


4 








G R A D 


E 6 






GIRLS 


TOTAL* 


BOYS 




cms 


TOTAL* 




Cum. 




Cum. 


Cum. 


Cum. 


Cum. 


No. 


Zaee 


No. 


Zage 


No. Zage 


No. %age 


No. %agc 


4 


99 


9 


99 


c 
D 


QQ 


7 


QQ 

y y 


12 


99 


2 


99 


4 


99 


/ 


QQ 

yy 


9 


QQ 

y y 


9 


99 


5 


99 


7 


99 


i J 


QQ 

yy 


in 


QQ 

y y 


25 


99 


10 


99 


21 


99 


Z J 


QQ 

yy 


?s 


QQ 

yy 


50 


99 


21 


99 


41 


99 




QQ 

yy 


1 J 


QQ 

y y 


35 


99 


30 


99 


54 


99 


OA 


QQ 

yy 


99 


QQ 

y y 


47 


99 


48 


98 


92 


99 


A9 


QA 
yO 


J J 


QA 
yo 


101 


98 


73 


98 


129 


98 


o*+ 


yo 


AA 


97 


157 


98 


77 


96 


177 


97 


OA 
O** 


Q7 
y 1 


y*^ 


96 


181 


96 


130 


95 


240 


95 


17/ 


Q*^ 
yj 




94 


457 


95 


189 


92 


330 


93 


1 Aft 
IDO 


Q7 




89 


349 


91 


236 


89 


446 


90 


9An 


AQ 




86 


533 


88 


293 


85 


570 


87 


909 


AA 

OH 




82 


652 


83 


457 


79 


840 


82 


J 1^ 


7Q 
/ y 


386 


75 


726 


77 


397 


71 


760 


75 


JU / 


73 


412 


68 


818 


71 


454 


64 


920 


69 


388 


67 


439 


61 


880 


64 


549 


56 


1133 


61 


449 


60 


519 


52 


1003 


56 


462 


46 


921 


51 


439 


52 


466 


43 


959 


48 


376 


38 


841 


44 


468 


44 


414' 


34 


929 


39 




J I 


880 


37 




36 


420 


26 


906 


31 


344 


24 


756 


29 


367 


28 


276 


19 


676 


24 


286 


18 


796 


23 


313 


21 


200 


13 


554 


18 


208 


13 


565 


16 




16 


123 


10 


354 


13 






376 


12 


164 


12 


104 


7 


284 


10 


111 


6 


321 


8 


139 


9 


74 


6 


233 


7 


79 


4 


230 


6 


91 


7 


59 


4 


160 


5 


53 


3 


156 


4 


78 


5 


39 


3 


124 


4 


46 


2 


118 


2 


58 


4 


34 


2 


94 


3 


24 


1 


75 


2 


44 


3 


38 


2 


85 


2 


9 


1 


45 


1 


37 


2 


24 


1 


61 


1 


12 


1 


25 


1 


14 


1 


11 


1 


26 


1 


7 


1 


24 


1 


14 


1 


9 


1 


26 


1 


5 


1 


14 


1 


9 


1 


4 


1 


13 


1 


3 


1 


12 


1 


8 


1 


3 


1 


11 


1 


4 


1 


10 


1 


17 


1 


2 


1 


19 


1 


5,510 




11,938* 


5,625 




5,371 




11,549* 




111 




110 




111 




113 




111 




102 




101 




100 




103 




102 




93 




90 




91 




95 




92 




102.32 




100.41 




101.01 




103.91 




102.32 




14.27 




14.75 




15.68 




14.45 




15.12 





Mean 98.88 
Standard Dev. 14. 94 

*The Distributions of DIQs for Boys and Girls do not sum to the Total Distribution 
because of the failure of a substantial number of pupils to code sex. 



ERiC 



-15- 

19 



K.H* Statewide Testing Program Evaluation * TUB 



administered. An instance would be the admini- 
stration of the test to a nearly non-English 
speaking child with severe listening und reading 
problems. 

The major reason for computing a measure of 
brightness is to determine to what extent we can 
expect above or below normal achievement for any 
chi Id , assuming that he i£ of normal age for his 
grade placement and has had a more or less normal 
experience in school, i.e., has not been absent 
extensively, for example. A much better way of 
making systematic comparisons of capacity and 
achievement is to use grade based norms. Such 
norms are available for the Otis-Lennon both as 
national and as local (state) stanines , and for 
each of the Stanford Achievement Tests. 

Raw score distributions on the Otis-Lennon 
test also are available. The median raw score for 
Grade 4, for exanple, is 32.9 out of a possible 80 
and the mean is 34.0. Both of these averages 
would put the children in New Hampshire within the 
fifth stanine nationally, i.e., within the normal 
range for these grades. More precisely, the Grade 
4 median raw score would ha^e a percentile rank of 
54 and the median raw score of Grade 6 would have 
a percentile rank of 55. 

Local vs National Norms 

Stanines based on score distributions for a 
local population such as a single grade in this 
state, a school district, or even a school, are 
better than national norms when one wants to can- 
cel out systematic population differences in 
achievement from subject to subject when comparing 
capacity and achievement. State and local stanines 
based on raw scores were made available for Stan* 
ford and Otis-Lennon in both 1968 and 1969. As 
the next part of this study, bivariate (two-way) 
distributions were made for Otis-Lennon raw score 
stanines versus each of the tests in the Stanford 
Battery given in the fall. 

It would be much too space*consuming to re- 
produce all bivariate charts for the Otis-Lennon 
Mental Ability Test stanines versus all of the sub- 
tests in Stanford for both grades, but one such 
chart has been reproduced from the 1968 program in 
order to illustrate several points which are very 
important concerning the technique for and contri- 
bution of the bivariate distribution as a way of 
comparing school learning capacity as measured by 
a mental ability test with measured achievement in 
school. 2/ 

Selected for this purpose was the Oti8*Lennon 
Mental Ability Test state raw score stanines versus 
similar Stanford Paragraph Meaning stanines in 
Grade 4. A sentence or two about stanines may be 
iaportant here for the general reader. Stanines 
are, essentially, normalized standard scores i which 



1/ 1969 data wre unavailable to the writer at the 
tlM this report vaa prepared* 



means that they have the characteristics of atj^^^ 
equal-unit scale. For example, the rungs of a' 
ladder are equally spaced apart and thus may be 
considered, in a sense, an equal-unit scale. In a 
somewhat analogous sense stanines arc like 9-step 
ladders having run^s equally spaced. 

Perhaps we can look at this bivariate dis- 
tribution without getting more deeply into the 
complications of how stanines are computed at this 
time. 

See Chart IIIB-I. 

It is easy to see that there is a general 
drift in the cell frequencies from lower left to 
upper right with a concentration of cases appear- 
ing along the mid-diagonal line, shown by the dot- 
ted line. If we mark off one stanine cell at each 
stanine level to the right and left of this mid- 
diagonal by zigzag lines, we will have the mid- 
stanine band or range. On Chart IIIB-I a large 
proportion of the cases in the distribution is 
found within this band. As a matter of fact, the 
percentage of cases falling within this mid-stanlm 
range is virtually of the same magnitude as the 
Pearson product moment correlation coefficient as 
reported, which is .73. 

All test scores are subject to some kind of 
measurement error, that is, variation due to chance 
factors that cannot be identified or controlled. 
Usually one stanine accounts for better than one 
standard error of measurement expressed in raw 
scores. Thus we can say, for all practical pur- 
poses, that most of the yotmgstisrs falling within 
this mid-stanine range are indeed performing in 
Paragraph Meaning in a manner consistent with their 
mental ability stanine as measured by the Otis* 
Lennon. Please remember that these stanines are 
not based on DlQs but are based upon the distribu- 
tions of raw scores . All fourth graders tested in 
the State of Mew Hampshire in Fall 1968 are in- 
cluded. ThuS) roughly one-Kjuarter of the students 
fall outside this mid-stanine band for many rea- 
sons, about half above and half below the band. 



A scattering of cases show a surprising con- 
trast between performance on the mental ability 
test and performance on the Paragraph Meaning. 
For example, there is one child shown who had a 
stanine of 9 on the Otis-*Lennon Mental Ability 
Test but only a stanine of 1 on Stanford Paragraph 
Meaning. Such deviant individuals are very rare 
and almost always can be accounted for by some di- 
gression from good testing practice or by actual 
errors in taking the test or in data processing. 
It is standard operating procedure and a very 
highly reconnended practice, that students falling 
outside the mid**8tanine range be studied more care- 
fully than those within this range to be sure 
there are no such irrelevant factors involved. If 
such factors can be identified, corrective action 
can be instituted in one way or another. This 
would be especially true of the very extreme cases 
noted. 



ERiC 



N.H. Statewide Testing Program Evaluation - IIIB 



Chart IIIB-1 

A Representative Blvarlate Chart or Correlation Plot Showing 
Graphically the Degree of Correspondence Between Paired Scores* 



Stanford Paragraph Meaning 





1 


2 


3 


4 


5 


6 


7 


8 


9 




9 


1 






1 


7 


18 


96 


165 


✓ 

✓ 

X87 


575 


8 




1 


1 


11 


37 


97 


291 


7 

/2'36 

✓ 


115 


789 


7 


6 


8 


19 


72 


213 


416 


> 

^?0 


305 


65 


1724 


6 


19 


44 


86 


291 


638 


✓ 

^8 

✓ 

✓ 

✓ 


51 > 


137 


22 


2415 


5 


40 


124 


254 


579 


7 

/ 

✓ 

✓ 

* 


« ; 

513 


249 


18 




2631 


4 


95 


233 


376 


7 

/ 

✓ 

(— 


596 


223 


47 


6 




nil 


3 


105 


296 


3>2 
✓ 


470 


259 


67 


24 


3 




1596 








293 


250 


116 


21 


4 


1 


2 


1059 


1 


115 


123 


107 


112 


30 


7 


' 2 






496 
13557 


V 

1 

w> 1 


496 


1086 


1508 


2482 


2749 


2030 


1843 


871 


492 



r - .73 

% In Mld-Stanlne Band - .73 
*FroTn the Statewide Testing Program for Grade 4 Fall 1968; stanlnes, 
In both cases being based on raw scores. 



Table 1118-2 

New Hampshire Statewide Testing Program - Fall 1968 
Correlations - Otis-Lennon and Stanford Tests 





Grade 


4 


Grade 


6 




Sta- 


Raw 


Sta- 


Raw 


Test 


nine 


Score 


nine 


Score 


Word Meaning 


.73 


.76 


.74 


.74 


Paragraph Meaning 


.73 


.76 


.77 


.76 


Language 


.74 


.75 


.78 


.78 


Spelling 


.63 


.66 


.61 


.61 


Word Study Skills 


.68 


.70 






Arlth. Computation 


.42 


.44 


.51 


.51 


Arlth. Concepts 


.64 


.68 


.69 


.68 


Arlth. Applicatlona 


.66 


.69 


.71 


.72 


Social Studies 


.72 


.74 


.76 


.75 


Science 


.73 


.76 


.73 


.73 



ERJC -.1,. 21 



N.H. Statewide Teating Program Evaluation - IIIC 



If a test result Is suspicious in terms of 
the student's previous perforriiance, thw recom- 
mended procedure wojld certainly be to retest that 
individual. Both of th' tests oeing compared have 
very nhort time limits ndeed compared to the many 
hours spent learning to read. Surely, it ls not 
fair to a child tc judge him on the basis of any 
one pair of such test scores. Therefore repeated 
testing, preferably cumulative over a period of 
years » is highly recommended. Obviously, one men- 
tal ability test in the mid*elementary grades is 
entirely insufficient, in many casea using one 
test result, not substantiated by others, can do 
irreparable damage to individual pupils. This is 
especially true of those having greatly deviant 
scores between normally highly correlated tests. 
This disadvantage is maximized if teachers, par- 
ents or children accept such a single test result 
as definitive and final. OOD FORBID! 

Relatfonshi r of Measured Mental Ability with 
Measure J Achievement ~ 

Just previously, ve have seen what the bivar- 
iate distrib»«t^ Dn or correlation plot looks like 
when nwasured mental ability (Otis-Lennon) is com- 
pared with a specific nveasure of a curriculum ori- 
ented variable, namely Paragraph Meaning. The 
product moment correlation was .73 for this par- 
ticular chart. (Note that when correlations for 
these same data were computed in terms of raw 
scores, the values reported from the computation 
center were slightly higher. Neither one it 
"wrong"; osing ungrouped raw score data with no 
real operational limits on the precision of the 
computations in terms of decimal fractions re- 
tained, etc., just yields a slight but insignifi- 
cantly higher value. The coarseness of grouping 
Involved in making the stanine blvariates is a 
factor, but this slight difference in the r's is a 
small price to pay for the advantages of seeing 
exactly what the correlation plot looks like.) 

In Table llIB-2 all remaining correlations of 
Otis-Lennon with Stanford Achievement subtests are 
shown for both raw scores and stanines and for 
Grades 4 and 6. 

In such a table Arithmetic Computation almost 
always is lowest with Sp^^lling usually next <n 
order. This is due to the tendency to teach these 
subjects in a more nearly rote fashion. Reading 
ability and reasoning ability are less important in 
these ereaa than in the other subjects. Even in 
cooBunities having a well established modem math 
curriculum these correlations will be low because 
th? teats emphasize outcomes rather than process. 

It is here that the finding that the percent 
of cases in the mid -stanine band closely approx- 
imates the stanine chart correlation coefficient 
becomes really important to understanding what 
this table neant. The coefficient subtracted from 
1.00 gives the percent outside the band. When 
this value is split in btlf , roughly one half can 
w» considered above the diagonal band and one half 

ERIC 



below. NOTE: not all cases requiring special at- 
tention are OUTSIDE the band; only those where 
there is a serious inconsistency in paired test 
results. 

Sone youngsters may be in such trouble that 
all ooj ctive measures uniformly underestimate 
their real school learning potential and real 
achievement potential. A spastic child or one hav- 
ing visual problems will do poorly on objective 
tests of ALL kinds. Far too often such children 
will also be under-rated by their teachers. It is 
at this point that one can see most clearly the 
need for a high level of competence among school 
teachers in understanding and using test results 
AND ALL OTHER OBJECTIVE EVALUATION DATA. 

SECTION III 
Part C 

Overall Description of the New Hampshire 
Population as Regards Achievement 



It is very pertinent to ask what kind of 
achievement is characteristic of children attend- 
ing the public schools of New Hampshire in terms 
of national norms on the Stanford Achievement 
Test. In the 1969-70 testing program, exactly the 
same achievement tests were administered as in 
1968-69, namely Form X of Stanford at all tested 
grades (except Primary II, Form W in Grade 2 in 
the spring of 1969). 

The data presented herein are not inconsis- 
tent with the data for 1968-69 but, since strenu- 
ous efforts were made to improve the administra- 
tion of the tests in 1969-70 and since available 
data for spring and fall testing of Title I cases 
is limited to 1969-70, only the 1969 Fall testing 
program data will be considered in this section. 

In Table III-C-1 the raw scores corresponding 
to selected percentile ranks 75, 50 and 25 are 
tabled for each of the tests in the Stanford Bat- 
tery. The raw score percentiles were then ex- 
pressed in terms of Stanford grade equivalents for 
both Grade 4 and Grade 6. These norms are based 
on about 9S% of the norm group tested in March 
1963, It to 2% being eliminated as being extremely 
atypical as to age. These Stanford data are shown 
to the left of the vertical line separating the 
Stanford information from the derived Metropolitan 
normative information. 

The New Hampshire norm for Grade 4 would be 

a grade equivalent of 4.2 and, similarly, the norm 

for Grade 6 would be 6.2 since the teats were ad- 
ministered in October. 

An examination of this table shows that the 
raw score percentiles, when transformed into grade 
scores or grade equivalents, are below the Stan- 
ford national norms on every test in Grsde 4 and 
also in Grade 6, This is true in apite of the 

.22 



N.H. Statewide Testing Program Evaluation - IIIC 



fact that the State ia above the national norm on 
the Otis-Lennon Mental Ability Test at these grade 
levels. 

However, an examination of the Stanford Norms 
booklet and the Technical Suppletnent reveals that 
the grade populations used for norming the Stan* 
ford series, while including very large numbers of 
cases, were above average in brightness, the de- 
viation IQs on the Otis Quick-Scoring Mental Abil- 
ity "lest being as follows: Grade 2, 105 , Grade 4, 
109 , Grade 6, 109, and Grade 8, 108. No one knows 
why tnis upward deviation occurred. Unusual care 
was used in selecting the norm aample when this, 
series was standardized but some unknown bias re- 
sulted in unrepresentative samples as regards 
brightness. This in turn seems to have resulted 
in norms that were too "hard". "Hard" is 'in 
quotes because it is evident from an examination 
of the relationship between total poasible acore 
and raw score corresponding to the selected per- 
centile ranks that the tests actually were on the 
easy side* In this context "hard" means that a 
lower grade equivalent was assigned to each acore 
than subsequent experience seemed to justify. 

Thc.aim is to have the norm fall in the mid- 
dle of the range of possible scores in order to 
fuiasure all achievement levels in the tested group. 
Stanford seems to have met this criteria in Grades 
4, 6 and 8; the Primary I Battery, however, was 
much too easy, if one may judge on the basis of 
the raw score distributions. 

Furthermore, Stanford was standardized in 
March while our group was tested in October* 
Spring norms generally run "harder" than fall 
norms for reasons not too well understood and too 
complicated to explain her'c. 

The important question is whether New Hamp- 
shire childien are really achieving as poorly as 
Stanford norms indicate. Some light can be shed 
on this matter by considering New Hampshire 
achievement in terms of the more up-to-date Metro- 
potLitan '70 norm data. This is made possible by 
the publication of equivalence tables for Stanford 
and Metropolitan '70. 

The revised Metropolitan Achievement test 
was standardized in 1970 on a national atratified 
random population* During the period between the 
standardization of the Stanford in 1963 and Metro- 
politan in 1970, substantial developments occurred 
in the technology of choosing atratified random 
samples for normative purposes*]^/ AIgo, great 



1^/ Perhaps the moat notable study in this area is 
That conducted by Dr. ThoiMS P. Hogan entitled » 
"Socioeconomic Comnunity Variables as Predictors 
of Test Perforttince"* 0r. Hogan also wa^ respon- 
sible for selecting thm norvaCive sample used in 
^ c Metropolitan standaraisation program* 

ERIC 



strides were made in computer technology and cap- 
ability. Thus, the new Metr- ^olitan norms must be 
conaidered intrinsically superior to, i.e., more 
representative than, the Stanford norms. Moreover 
no rne can discount the fact that important cur- 
riculum changes did occur during this period. 

In the 1958 edition of Metropolitan the wide- 
ly used norms were "age controlled" norms, i.e., 
those children most likely to be at grade for age. 
The published Metropolitan '70 revision presently 
providea only nonns for total population with no 
elimination of over-age pupils* Since Stanford 
norma used nearly all cases, regardless of age, 
these are more comparable in range of age than 
age controlled norms* Age controlled norms also 
will be generally available for Metro '70 shortly 
and in the writer* a opinion are much more service- 
able at a time when there ia growing emphasis on 
individualization of instruction* 

It is routine procedure at Harcourt Brace 
Jovanovich, whenever either the Metropolitan or 
the Stanford series is revised, to carry out a 
careful series-to-series equating program in order 
to facilitate going from one aeries to the other 
for those who wish to evaluate differences from 
aeries to aeriea which may be due to norm differ- 
encea. It was the existence of such tables of 
equivalence, given in grade equivalent terms, that 
made the above described comparisons possible in 
this study. 

Look now at the equiyalent Metro '70 grade 
equivalents shown in the column to the right of 
the Stanford value a*. The net effect is to suggest 
that New Hampshire is, in truth, performing more 
nearly up to its measured capacity than indicated 
on the basis of Stanford norms* For the moat part, 
this section of the table atrongly indicates that 
there should be no real concern for the level of 
educational achievement in this state* The low 
points at Grade 4, according to Metropolitan norma, 
are Social Studies and Science where the deficit 
amounts to *2 of a calendar year at Grade 4 but is 
at the norm in Grade 6. In Grade 6, performance 
in Arithmetic Computation and Arithmetic Concepts 
shows a deficit of .2 of a year* In all other 
tests the state group was at or above the Metro- 
politan total population national norm* 

Literal interpretation of these equating 
tables does involve some risk of over-simplifica- 
tion* For instance, the Stanford Paragraph Mean- 
ing Test nay measure slightly different aspects of 
reading than the Metropolitan Reading Teat. Cor- 
relational data are lacking at this moment, but 
the writer's experience over the years has been 
that Stanford and Metropolitan tests correlate 
about as veil at two forma of either battery, es- 
pecially for Che basic skill subjects* It is also 
noteworthy that the technique used for standardiz- 
ing the Otit-Lennon Test, which is a relstively 
new test, was substantially similar to that used 
in standardizing the Metropolitan; in fact, Otis- 
Lannon was the precursor of the Metropolitan pro- 
' cadure in most basic aspects* 

: 23 



N.H. Statewide Testing Prograa Evaluation - IIIC 



StMnary 

FroM the data in Table III-C-1 it vould sees 
evident that we can consider New Hatnpshire a very 
typical sute indeed in terms of Metropolitan na- 
tional noms* Much of the concern expressed at 
the State Departaent level and in the comunities 
throughout the stste over the poor New Havpshira 
showing appears to be attributable to lack of rep- 

-^•«°tativeness in the SUnford norm population, 

pluir aome curricular changes, especially in arith- 
metic. There is evidence in the tables equating 
the '58 and '70 Metro editions that arithmetic 
achievement has declined, especially in Computa- 
tion. The writer is inclined to attribute the 
shift (if it really happened) to the haphazard, 
. unsystematic my that modem amth was introduced 
in the schools, plus a generally agreed lack of 
aufficient maintenance -of-ski lis work in the new 
math. 

The generalisation made about average (median) 
performance applita with almost equal force to the 
25th and 75th percentiles, also expressed as grade 
equivalents. 

A word of caution is necessary. Grade equiv- 
alents are not ever equal units from test-to-test 
within the same series or from level -to- level 
within the same test. Reading grade equivalents 
are MOT comparable to Computation grade equiva- 
lents. Standard deviations of grade equivalents 
tend to vary inversely with the gain in score 
points from grade-to-grade, for example. Language 
generally has the smallest increment in score and 
the largest standard deviation in grade equiva- 
lents. Computation, generally, has a large score 
increment from grade to grade and thus the «m11- 
esjt standard deviation of grade equivalents, fov 
this reason, pupil profiles in grade scores or 
grade equivalents are essentially/maaiiinsleaa. 

The grade equivalent, as usually computed, is 
based on the assumption of contimioua devmlopmant 
over a calendar y ear's time since a grade equiva- 
lent norm line is drawn through the median acores 
for successive grades , i.e., for children differ- 
ing in age and life experience by a full calendar 
year. In point of fact it is a trend line through 
the average scores of successive gradea teatad at 
the same time in the school year. Thia ignores 
differences that may occur subject-to-subjeet due 
to differentUl growth (or forgetting) during the 
suner vacation. The effect of summer forgetting 
or non-learning can be very great. Arithmetic 
compuution skill, for example, ia rarely learned 
to any degree out of the specific instructional 
environment of the achool, whereas reading and vo- 
cabulary continue to develop due to general life 
es^rience and mental development. In spite of 
this obvious 12 -month span, grade equivalent 
points, derived by dividing the total gain in raw 
score from grade-to-grade into ten parts, are of- 
ten erroneoualy called months . Since 180 days of 
schooling is abomt maximum for most systems, even 
dividing by 10 mould be a doubtful procedure if 
the resulting units are to be called **months^; 



nine months is more representative of the amount 
of time between opening and closing of school, es- 
pecially if %rithin-school-year vacation time is 
taken into account. 

Profiles are sensibly plotted only in terms 
of grade-based standard scores with some semblance 
of equality of units over the scale range. Use of 
such standard scores makes peer comparisons com- 
pkrable from test-to-test within a grade level and 
generally for the same test at successive levels. 
Stanines are by all means the most useful of such 
standard scores. National stanines are now pro- 
vided routinely but State stanines are preferable 
for pupil profiles. Such, as will be shown later, 
have been provided in New Hampshire. 

Table III-C-2 

Although most of this analysis will be con- 
cerned with interpreted scores, grade equivalents, 
percentilea, stanines (local and national) and the 
like, it is imporunt to examine the raw score 
characteriatics of each of the tests used at the 
two grades chosen for this study, namely Grades 4 
and 6. An examination of these data reveals char- 
acteristics of the ^ests in a way that cannot be 
seen by any other means of analysis. 

A good test, in the technical sense, for a 
particular population would be one i^re the aver- 
age score (either mean or median) is far enough 
above a zero score and far enough below a perfect 
acore to permit all the individuals tested to in- 
dicate what they are capable of doing. For ex- 
ample, a median raw acore of 15 out of 38 points 
on Word Meaning for Grade 4 ia marginal. The test 
appears to be too hard. On the other hand, there 
are only 38 items so the other possibility is that 
the teat ia too short for separate interpretation. 
Certainly the bottom 50Z of the group cannot be 
very well diatributed with only a total of 14 ad- 
ditional points of score available to represent 
all levela of achievement from lowest to the aver- 
age. 

Generally apeaking, the Intermediate II Bat* 
tery uaed at Grade 6 doea a better Job in this re- 
apect than the Intermediate I at Grade 4. • Here 
there are 48 items in the Word Meaning Test and 
the median for tho atate ia 25; the 75th percent- 
ile ia 32, leaving plenty of "top" while the 25th 
percentile of 19 leaves much more "bottom" to take 
care of the leaa able youngsters. 

A compariaon of the means and the medians for 
each teat, given in parallel columns, will indi- 
cate to aome extent the akewneaa in the distribu- 
tions. 

Recall from our previoua diacuasion that in 
a akewed diatribution the mean is always in the 
direction of the loog tail as compared to the me* 
dian. Mooe of the teata Ubled here aeems to be 
seriously skewed, but an examination of the com- 
plete diatributiona o^ acorea that were made, teat 
by teat, in the entire aute in 1968-69 and again 



ERiC 



«20- 24: 



N.H Statewide Testing Program Evaluation - IIIC 



Table III-C-l 



EQUIVALENT GRADE SCORES 
In Terms of Metropolitan '70 Norms 
For 25th, 50th and 75th Percentile Ranks 
New Hampshire Statewide Testing Program 
October 1969 
Statewide 





G R A 


D E 


4 






G R A 


D E 


6 




Stanford Int. 


I 


MAT 


Stanford Int. 


II 


MAT 




Ule 


Raw 


Grade 


•70 




7olle 


Raw 


Grade 


'70 


Test 


Rank 


Score 


Ec^ulv • 


G.E.3/ Test 


Rank 


Score 






(38) Word 


75 


19 


4.6 


4.9 


(48)Word 


/ J 


■^1 

ji 


7.1 


7.8 


Meaning 


50 


14 


3.8 




MAan4 nor 

nectnxng 


50 


24 


5.9 


6.4 




25 


9 


3.2 


3 5 


25 


18 


4 9 


5.2 


(60) Paragraph 


75 


29 


4.4 


5.0 


(64) Paragraph 


/ J 




6.7 


7.6 


Meaning 


50 


22 


3.7 


4 2 


Meaning 


50 


31 




6 3 




25 


16 


2.9 


3 2 


25 


23 


4 6 


5 3 


(50) Spelling 


75 


28 


4.6 


5.0 


(56)spelling 


75 


"ifi 
j\j 


7.0 


7.3 




50 


19 


3.8 


L 1 

H . X 


50 


28 


5 9 


6.2 




25 


13 


3.2 


3 4 




25 


20 


5 4 


5.7 


(61)Word Study 75 


44 


5.5 














Skills 2/ 


50 


34 


3.9 




NO T 


EST 










25 


24 


2.7 














(L22)Language 


75 


73 


4.5 


5.3 


(134)Language 


75 


yD 


7.1 


7.9 




50 


62 


3.5 


4 1 


50 


82 


S 7 


o . •+ 




25 


52 


1.7 


1 7 

X . / 




25 


70' 






(39)ArlthiDetlc 


75 


14 


4.0 


4.6 


(39)Arithroetic 


75 


1 7 


5.9 


7.2 


Comp . 


50 


11 


3.6 


4.1 


Comp . 


50 


13 


5 2 


6 0 




25 


8 


3.1 


3.5 


25 


9 


4 4 


5 1 

. X 


(32)Arlthmetlc 


75 


16 


4.8 


5.2 


(32)Arithmetic 75 


lb 


6.5 


7.1 


Concepts 


50 


11-12 


4.0 


4.3 


Concepts 


50 


12 


5.6 


6.0 




25 


8 


3.0 


3.2 


25 


9 


4.9 


4.6 


(33)Arlthraetlc 75 


16 


4.6 


5.0 


(39)Arithmetic 75 


22 


6.6 


7.4 


Appl . 


50 


12 


4.0 


4.2 


Appl . 


50 


16 


5.6 


6.3 




25 


£ 


3.4 


3.6 


25 


11-12 


4.5 


4.9 


(49)Soclal 


75 


25 


4.5 


4.7 


(74)Social 


75 


47 


6.8 


7.8 


Studies 


50 


20 


4.0 


4.0 


Studies 


50 


37 


5.6 


6.2 




25 


15 


3.5 


3.5 




25 


29 


4.8 


5.2 


(56)Sclence 


75 


31 


4.6 


5.0 


(58)Science 


75 


37 


6.7 


7.7 




50 


23 


3.9 


4.0 


50 


30 


5.6 


6.2 




25 


17 


3.5 


3.5 




25 


23 


4.4 


4.7 



jL/The number in the ( ) is the number of items on the test. 



2/ No comparable test in Metropolitan '70 Elementary or Intermediate I Battery 

3/These are Metropolitan Total Population norms, which are most nearly 
*~ comparable to Stanford. 



ERIC 



-21- 



25 



N.H. Statewide Testing Program Evaluation - mc 



Table III-C-2 

Analysis of Test Characteristics Relating to »*Goodness of Fit" 
Of each Test for the Level at which It Was Used 
For the Total Population Tested and for the Random Sample 

Grades 4 and 6 - Statewide Testing Program Pall 1969 

STANFORD ACHIEVEMEKT TEST 
Intermediate I Battery Intermediate il Battery 



GRADE 4 GRADE 6 









Total 


Random 






Total 


Random 








Population 


Sample 






Population 


Sample 


Ho. of 


tile 


Select. 




Select. 




No. of 


tile 


Select. 




Select. 




Test 


Items 


Rank 


7.iles 


Mean 


Xiles 


Mean 


Items 


Rank 


Xiles 


Mean 


7.ile8 


Mean 


woro 


JO 


75 


20.0 


21.0 


48 


75 


31.5 




32.0 


UoAM Ann 

neaning 




50 


13.0 


15.0 


15.5 


15.9 




50 


25.0 


25.0 


26.0 


25.7 






25 


in n 




10.5 






25 


18.5 




19.5 




im Lam 




71; 

r J 


29.0 




31.0 




64 


75 


41.0 




41.0 




Mo on 4 no 
ntf u 11 X 




50 




23.4 


23.0 


24.4 




50 


31.5 


32.2 


33.0 


32.9 






25 


ID. 3 




17.5 






25 


23.5 




25.0 




Arith. 


39 




14.5 




14.5 




39 


75 


17.0 




17.0 




Comp. 




50 


11.0 


11.6 


11.0 


11.5 




50 


13.0 


13.6 


13.5 


13.9 






25 


8.5 




8.5 






25 


9.5 




10.0 




Arith. 


32 


75 


16.5 




16.5 




32 


75 


17.0 




17.0 




Cone. 




50 


12.0 


12.7 


12.0 


12.9 




50 


13.0 


13.3 


13.0 


13.5 






25 


9.0 




9.0 






25 


9.5 




9.5 




Arith. 


33 


75 


17.5 




17.5 




39 


75 


22.5 




22.5 




Appl. 




50 


12.5 


12.7 


12.5 


12.9 




50 


16.5 


17.4 


16.5 


17.6 






25 


9.0 




9.0 






25 


12.0 




12.0 












OTIS-LBNNON MENIAL ABILm TEST 




















Elementary II Battery 












Otit-L 


80 


75 


43.1 








80 


75 


65.0 








Raw 




50 


32.9 


34.0 








50 


54.7 


52.4 






Score 




25 


24.0 










25 


42.3 








Otis-L 




71: 


110.5 




111.5 






75 


112.0 




112.0 




IQ 




50 


100.5 


100.4 


102.5 


100.8 




50 


102.0 


102.3 


102.5 


102.4 






25 


90.0 




93.0 






25 


93.0 




93.0 



ERIC 



-22-26 



N.H. Statewide Testing Prograa Evaluation - HID 



in 1969-70 reveals that the distributions tend to 
be toward the low end of the scale. The distribu** 
tions are not noticeably skewed however. 

Data for the Otis-Lennon Mental Ability Test 
is shown at the bottom of the page. This test was 
administered only in the fall while the Stanford 
Tests were repeated in the spring with selected 
samples. The Otis-Lennon was a revision of the 
Otis Quick-Scoring series, improved to provide a 
better measure of ^'C* or general factor of intel- 
ligence which is a kind of overriding mental 
ability which, taken in toto, seems to correlate 
fairly highly with various criterion measures. It 
yields a single measure of brightness called a de- 
viation intelligence quotient (DIQ). It also has 
grade oriented norms (percentile ranks and sta- 
nines) which greatly enhances its usefulness. Of 
the tests in the 1969-70 statewide bat:tery, Otis- 
Lennon does s better Job of predicting success in 
any subject matter field than does any other test. 
(See the intercorrelation table in Appendix.) The 
Otis-Lennon stanine, for this reason, is given a 
weight of 3 when included in the composite prog- 
nostic score compared to a 2 as the highest wei',ht 
given any other single test.l/ 

Mental ability tests, such as Otis-Lennon, 
have been criticized by many people who lack fun- 
damental knowledge of mental measurement as being 
unfair for many children from an environmental 
point of view. The same argument, of course, can 
be t«de for a reading test or a spelling test or 
any other kind of test if one realizes that per- 
formance in any of these areas is not totally de- 
termined within the bounds of the instructional 
program in the school. Mental ability tests have 
a tremendous usefulness for teachers who under- 
stand their strengths and limitations since they 
provide another look at the child in the broad 
spectrum of his intellectual or cognitive develop- 
ment. Anyone who has studied a correlation matrix 
such as that mentioned above, especially if he ex- 
amines the actual bivariate charts, can't reason- 
ably be worried about determinism, i.e., the self 
fulfillment prophecy. 

Prom the raw score data, it is apparent that 
this test is functioning very well since both the 
median and the mean are substantially above the 
chance level at both grades. The mid-value Utween 
tl\e median of Grade 4 and Crade 6 would be a score 
of 44 out of 80 possible. (Hote that the same test 
is used at both grade leveJs.) This represents 
about the best possible fit that one could obtain 
for a test designed to be used over a three -grade 
rac^e, as this one is. 

1/ This does not mean that tttare are no coeffi-^ 
cients higher than Otta -Lennon. Two arithmetic 
tests may intercorrelata higher than Otis-Lennon 
with either, hut will not corraUte as high with 
other subject tests. Ferbaps it is better to say 
that the vedian of the Otis-Ltmicm correlation 
with SUnford subtests is higher than a similar 
autistic for any other test. 



SBCTION III 
Fart D 

Variation Among Communities 



In this Title I report we would like to be 
concerned with the performance of individual chil- 
dren within each school district as the most sig- 
nificant breal: nwn in the report. Hopefully, in 
many instances we might investigate the perform- 
ance of Title X children within individual schools. 
However, ihe total number of cases available in 
Title I in any one school almost always is too 
small to make any valid statistical comparisons 
within a school, although much more could be done 
if school and district distributions were conven- 
iently available. Thus one is left with the com- 
parison of results from Pall to Spring for individ- 
ual pupils against all Title I cases as a refer- 
ence population and/or with the random sample. 

However, it was possible to obtain school 
district means separately by test for the 137 dis- 
trfcts mking up the tested state population. 
Since this information was in raw score form, the 
resulting means are not comparable from test to 
test. To meet this need, a system of standard 
scores was l^jaugurated by making a linear trans- 
formation of the raw scores for the total state 
2SL basis of the pupil score distributions so 
that the mean score for every test administered at 
each of these grades (Grades 2, 4, 6 and 8) was 
assigned a value of SO and a standard deviation 
of 10. 

Note these were pupil distributions. Trans- 
formation tables were generated by computer, were 
printed and made available for any who needed or 
wanted to have them. 

The description of the Kew Hampshire situa- 
tion would be inctnplete without studying the var- 
iability aiK>ag school districts. To accomplish 
this and to facilitate Mking school district (or 
school) profiles possible for satellite studies 
the ..ow score means of each school and school dis- 
trict were transformed using the new linear stand- 
ardized scores. Listings were made by school and 
school district and were turned over to the Depart- 
ment of Education for appropriate use. 

In this portion of the report we present 
graphically the distributions for 137 school dis- 
tricts in the state in terms of these pupil-based 
linear standard scores. 

One not aware of the reality of community dif- 
* ferences cannot help but be astounded at the 
spread of these standard scores on all tests, in- 
cluding the Otis-Lennon. One would expect the 
range of standard scores for district means to be 
much less than it was?although the mean of these 
distributions should approximate the mean of the 
population, as it did. Theae distributions are 
shown in Table III-D-1 and Table III-D-2 saps- 



id 

ERLC 



-23^: 



27 



N.H. Statewide Testing Program Evaluation * HID 



rately by grade and subtest together with an effec- 
tive graphic dispUy (histogran) of the distribu- 
tions. 1/ 

The value of each * varies frota figure to 
figure. The mode ALWAYS is set equal to 50, i.e., 
is represented by : *. The value of the * for any 
particular figure can be obtained by dividing 50 
by the nuaber of cases having the modal score* 
For example, in Table III-D-1 showing the distri- 
bution of Otis-Leonon standard scores the modal 
number is 26 so 50/26 or .52 is the value of a 
single *. 

This procedure has the virtue of making the 
shape of the distributions visually comparable 
from one histogram to another and also compensa- 
ting for wide swings in the number of cases. For 
example, it serves as well for statewide pupil dis- 
tributions of 13,00<>f cases as for these distribu- 
tions of 137 districts. 

In the Grade 4 distributions, the range for 
Otis-Lennon standard scores actually Is from a 
standard score of 31 to 61* Discounting the one 
extremely low standard score, namely 31, the range 
still is from 36 to 61 %rlth more or less a contin- 
uous distribution between these points. The com- 
puted standard deviation of 4.28 even more strik- 
ingly indicates this variability among districts 
since nhls approaches one-half of the assigned 
standard deviation of 10 for pupil scores. The 
meart or 49.28 for Otis-Lennon is perhaps unduly 
influenced, in view of the relatively small nui^r 
of school districts, by the one extreme case where 
the standard score assigned was only 31* Even so. 
It is not far off from the established mean of 50* 

Looking now at the Otis-Lcnnon distribution 
for Grade 6, with the same 137 school districts in- 
volved, find one school district with a mean of 
24, another one with a mean of 30, after which 
there is a skip to 41* The two extreme values 
must be discounted, but even doing this the range 
from 41 to 57 is very substantial indeed* Here 
a^ain the standard deviation of 4.21 approaches , 
half the standard deviation of the total pupil dis- 
tribution and the mean of 49*5 is approximately 
that of the population* 

One cannot help but ask how it can happen 
that entire school districts (granting that in 
some cases the number of children tested still is 
quite small) can show this kind of variability* 
Without a careful study of data not currently 
available to this writer, of such factors as socio- 
economic status within the community » amount ol' 
money spent for education, the quality of inatruc- 



1/ Note: The computer program used to produce 
those graphs was developed by Mr. Donald Bailey of 
the University of Mew Hampshire » Bureau of Educa- 
tional Research and Testing Services. It ia an 
intermediate printout in a comprehensive procedure 
for obtaining local stanines and corralations* 



tion and of instructional materials and, finally, 
the extent to %ihich other unspecified factors are 
influencing the data, no explanation is possible. 
One ouly has to face the hard reality that this is 
so and that there is little or no chance that this 
is due to anything other than factors quite inde- 
pendent of the test instruments used or any other 
factor that is related to the testing program* 
The data, in this writer's opinion, describes ac- 
curately existing condition?* vjlthin the school 
districts of this state* 

Perhaps the best confir!^ tion that this vari- 
ability of district means on O'^is-Tc^nMon is no 
chance finding lies in the ^act ctip.l the distribu- 
tions of achievesBent test meauo for these same 
coamuninies show the same phenomena of wide varia- 
bility ic\ coanunity means translated to comparable 
terms by m^ans of this rescaling technique. The 
writer has chosen to use the mental ability test 
for discussAon purposes because it seemed advisa- 
ble to escafe the trap of saying the school dis- 
trict showed a low mean because it did a poor Job 
of instruction in one field or another* This 
charge cannot be made, obviously, when the test is 
one t!^t measures the general reaction of the chil- 
dren in the district to solving problems re fating 
to his total environment and not the result of* spe- 
cific in-school, curriculum oriented knowledges 
and skills such as are measured by the achievement 
battery* 

After all, it makes little difference whether 
one says that the result on a test such as the 
Otis-Lennon is due to environmental influencea or 
to heredity or to some unknown and probably un- 
knowable mixture of the two* The high and posi- 
tive correlation between the results on the Otis 
and the results obtained >iien achievement tests 
are administered to the same children seems to es- 
tablish, %d.thin all reasonable bounds, that there 
are factors independent of the school instruction- 
al program which gravely affect the amount of 
learning that takes place* Any comprehensive plan 
for statewide development in the curriculum to im- 
prove the quality of learning must surely take in- 
to account the district variability thus noted, 
without at the same time neglecting the also clear* 
ly demonstrable fact that even within the poorest 
of these coimntnities there exists a wide variation 
in performance including levels of talent which 
would challenge the best tea::her* In other words, 
the variability of pupil s«:ores within the commun- 
ity is still very large when translated into 
standard scores even when the community mean is 
low or high* 

The inclusion of cumulative percents in these 
tables makes possible other interesting analyses* 
Assuming that one has a cOMMinity profile avail- 
able, it is possible to interpret the status of 
the coi^inity as reflected in transformed mean 
scores into statements such as: 

"CowKinity X haa a atandard score mean of 47 
on Otis-Lennon, which is comparable to the 23rd 
percentile for the 137 districts. In other words 
23t of the 137 school districts shown have an 



28 



N.H. Statewide Testing Program Evaluation ~ HID 



Table III-D-1 

Frequency and Cumulative Percent Distributions of Linear Standard Scores 
Corresponding to School District Means for 137 School Districts in 
Mew Hampshire testing in the Fall of 1969 

Grade 4 



Otis*Lennon Hental Ability Test: Elementary 11 Battery: Form J 



SCQR8 


PERCENTILE 




FR6 3UFNr Y 


31 


1 




I 




32 
33 


1 
I 




0 
0 


! Mean - 49.28 

I 


3* 


I 




0 


I 


3< 


1 




0 


I Standard 


36 
IT 


1 
1 


I 


I 
0 


1* Deviation » 4.28 


39 


2 




I 


1 1 


39 


1 




I 


I* 


40 


4 




I 


I* 


4t 


4 




0 


I 


42 


5 




? 


!••• 


41 


T 




\ 


I 


44 


M 




3 


1 ••••• 


4S 


16 




9 


1 


4^ 


/\ 




in 


I 


47 






R 


1 ^ 


4n 






B 




4^ 


4 \ 




U 


1 


50 






?h 




51 


7/ 




13 


I 




Hi 




16 


1 «««»^ 


51 


9U 








54 


9? 




3 


I 


5S 


9*1 




4 




56 


9'» 




? 


I«** 


57 


9H 




2 


I *«4 


51) 


90 




I 


I* 


59 


QO 




0 


I 


60 


07 




I 


!• 


61 


09 




I 


I* 




«nf on) 


Achievement Test 


: Intermediate I Battery: Form X 








Word Meaning 


scorf 


Pr** CrN lilt 




rP*")UFNCY 


IB 


I 


I 


I 


\ll Maan « 49.71 


11 


I 


1 


I 




40 


1 


1 


0 


I 


n 1 


1 


1 


0 


I Standard 


42 
41 
44 


5 
6 


1 


I 


{^^ Deviation * 4.03 


9 


f 


5 


I 


4S 


14 


•> 


6 


|«««« 


46 


?7 


3 


U 


I 


47 




4 


9 


I 


44 


35 








49 
SO 
51 
52 


45 




I < 




51 
60 




9 


I 


5 


'A 


1 


TT 


6 


I I 




5* 




7 


n 




54 




S 


10 


I 


55 




^ 




I 


56 


9^ 




0 


I 


5T 


00 


> 


^ 


I 


51 






0 


I 


59 


99 


9 


0 


I 


60 


01 


0 


I 


\** 


61 


«»4 


0 


0 


I 


6^ 


Afl 


9 


0 


I 


6) 


00 


0 


0 


I 


64 


O'l 


'J 


n 


I 


65 


90 


9 


0 


1 


66 


00 


0 


1 


I** 



ERIC 



-25-29 



N.H. Statewide Testing Program Evaluation - HID 



Table lII-D-1 
(Continued) 



Grade 4 





Stanford Achievement Test: 


Intermediate I Battery: Form X 








Paragraph Meaning 


SCOHf 










1 

1 


1 


1 


i«« Mean « 49.50 


J« 


% 


I 


0 


I 


39 


I 


1 
I 


0 




^o 


1 


I 


0 


I Standard 




1 


1 


I 


Deviation - 3.62 


M> 


1 


: 


0 


f 


A3 


4 


1 


4 




44 


/ 


2 


3 




%6 


11 


7 


f. 




4** 


17 


^ 


9 




47 


>? 


3 


14 




4'« 


40 


4 


17 




4'^ 


5J 


5 








64 


5 


15 






7« 


u 






*>^ 


H4 


7 


15 




S3 


fl9 




7 




54 


Q4 


(4 


5 






•1% 




? 




SA 


*IS 


R 


1 


I** 


57 


Q 1 


rj 


4 




53 


#»•! 




2 


\ 


•»n 


no 




0 


1 


60 




9 


0 


1 


&1 






0 


\ 


hi 






1 










Arltbnatlc CoMpuution 


SCOKE 




STAN iNE 


FftEOUFNCY 


38 




* 


2 


Mean - 48.82 


39 


• 
1 


} 


0 


I 


40 


3 




? 


I ••••• 


41 


6 




4 




42 


7 




2 


Deviation - 4.51 


43 


U 




5 




44 


1» 




10 




45 


24 








46 


2- 








47 


37 




11 




411 


4*# 




17 




49 


5« 




12 




50 


65 




11 




51 


12 




R 




5? 


78 




0 




53 


ft7 




12 




54 


91 




S 


t 


55 


44 




5 




56 


06 




) 




5T 


4-1 




? 




5a 






0 




59 






I 




60 


99 




0 


I 


61 






0 


I 


6? 


n9 




2 





ERIC 



-2630 



N.H. Statewide Testing Program Evaluation * HID 



Table III-D-1 
(Continued) 



Grade 4 

Stanford Achievement Test: Intenoediats I Battery: Form X 
Arithxoetic Concepta 



SCC«<t PP«CFtTItF STA 

^S I 

36 1 

^7 1 

^fJ 1 

39 I 

hn 4 

*1 6 

4'> « 

4J }/ 

44 IS 

4«i <'0 

46 2^ 

47 

4«< 4? 

49 S/ 
50 

51 7S 

»*^ 

S4 
55 

57 
58 

S»> 0*1 
HO 



tNE FREQUENCY 

t i^^ Mean - 48.66 

0 I 

0 I 

0 I Standard 

3 Deviation - 4.,03 

3 
5 
5 

7 I 

1 5 I ••••• 

9 {^^••••••«*«««««#* 

lu I 

•4 I •••••••••••••••••••••••••••••• 

I 

A I 

-) I •••••• 

0 I 

0 I 

0 I 
I 



Arlthsaetic Applications 



SCORF PIKCFNTILE STAt, 

3^ 1 

36 1 

37 I 
^A I 

^<9 I 

40 1 

41 1 

42 » 

43 T 

44 II 

45 IS 

46 . .***» 
4 7 2'^ 
4/* ^1 
40 50 

50 7' 

51 a) 
5;» »»T 

5^ 
54 
55 

56 **d 
il? 

SB 0" 

50 gr, 



NF FREOUEVCV 

1 I* Mean « 49.09 

0 I 

0 I 

0 I Standaxd 

J } Deviation - 3,67 

1 I* 

0 I 

Z 

5 !•* 

/, 

6 I ••••••••• 

12 I •••••• 

U 

, ; I 

31 I 

10 ••<••*• 

3 

I* • 

^ I ••••••••• 

1 I* 

0 I 

1 I* 

2 !♦•• 



ERiC 



■27- 

31 



N«H. Statewide Testing Program Evaluation - HID 



Table III-D-2 

Frequency and Cumulative Percent Distributions of Linear Standard Scorea 
Corresponding to School D^ -strict Meana for 137 School Diatricta in 
New Rampahire Testing in the Fall of 1969 

Grade 6 



Otis-Lennon Mental Ability Teat: Elenentary II Battery: Form K 



SCORE 


PERCENT 


24 


1 


25 


1 


26 


1 


27 


1 
• 


2S 


\ 


29 


1 


30 


1 


31 


1 


32 


I 


31 


I 


34 


I 


35 


1 


36 


I 


37 


I 


3S 


1 


3V 


1 


40 


1 


41 


2 


42 


2 


43 


6 


44 


f 


45 


10 


46 


IS 


47 


?3 


48 


34 


49 


42 


50 


bS 


51 


72^ 


52 


SO 


53 


91 


54 


93 


55 


96 


96 


9S 


5T 


99 



I !•« 

0 I 

0 I 

0 I Mean » 49.50 

0 I 

0 I 

1 i«« Standard 

2 ! Deviation » 4.21 

0 I 

0 I 

0 I 

0 I 

0 I 

0 I 

0 I 

0 I 

0 { 
I 

0 I 

5 |««««««*«««« 

2 

1 1 I ••««•••*•••••••••••••••••* 

12 ivtM************************ 

14 - I ••••••• -•^••^•••••••••••••••««« 

4 I ••••••••• 

y t 

3 t ••••••• 

3 |«««^*«« 



Stanford Achievement Test: Intermediate II Battery: Form X 
Word Meaning 



SCORE 
39 
40 
41 
42 
43 
44 
45 
46 
47 
4S 
49 
50 
51 
52 
53 
54 
55 
56 
57 
5S 



fERCMTILC ST AMINE 



S 

u 
l» 

31 
40 
5^ 
64 
71 
S3 
90 
93 
96 
9T 
99 
99 



niEQUCNCV 

I 1*6 



Mean - 49.34 



Standard 
.lieviatioa 



0 I 
I 

1 166 

4 |«*«««««S6# 

4 t 

4 {••^•••M* 

5 |MiiS««»««««6«»««««««» 

20 t6«6#««««««*««»««««*«««»««««««»«**«« 

9 |#«6S0#*«#«*#««««%»6eS« 

$ f •696*M#^««« 

3 {••6«66« 

Z 1f6#S« 

2 |6«««« 
2 I6M«« 



3.46 



ERIC 



«28* 32 



N.H. Statewide Testing Program Evaluation - HID 



Table III-D-2 
(Continued) 

Grade 6 



Stanford Achievement Test: Intermediate II Battery: Form X 
Paragraph Meaning 



SCORE PEaCFNTlLC STANINE PRCOUENCY 



^0 
^1 
^2 

«6 
47 
4S 

*0 
SI 

5? 
^\ 

56 

*>r 

SO 



1 
1 
1 
2 

11 
17 

52 
70 
70 

<t9 
•^l 
94 
9ft 
99 
99 
•<•) 
Ovj 



1 Mean - 49.28 

0 I 

1 Standard 

9 Deviation • 3.05 

25 I 
12 

I •••••••••••••••••••••••••••• 

4 I •••••••• 

0 1 
0 I 



Arithmetic Computation 



SCORE 


PERCENTILE 


STANINE 


FREQUENCY 


42 


4 


I 


6 




43 


10 


2 


6 




44 


16 


3 


6 




4S 


26 


3 


13 




46 


32 


4 


9 




47 


39 


4 


9 




48 


49 


5 


14 




49 


56 


5 


10 




50 


64 


6 


11 




51 


77 


6 


10 




52 


80 


6 


12 




5) 




7 


7 




54 




7 


3 




55 


03 




6 




56 


95 


3 


2 




57 


97 




3 


Stt 


^'i 


9 


1 


!• 


59 


9R 


9 


0 


I 


60 


99 


9 


1 


!• 


61 


99 


) 


0 


I 


6? 


99 


9 


0 


1 


6^ 


99 


0 


0 


1 


64 


91 


9 


2 





Meaa - 49.04 
Standard 

Deviatioa ■4.45 



ERIC 



33 

-29- 



N.H. Statewide Testing Program Evaluation - HID 



Table III-D-2 
(Continued) 



Grade 6 



Stanford Achievement Test: Intermediate IX Battery: Form X 
Arithmetic Concepts 



CORE 


Pf^CkNTIte 


STANINE 


PKcJUENCY 


39 


1 


I 


1 




40 


1 


I 


0 


I 


41 


1 


1 


0 


I 


42 


2 


1 


2 




43 


4 


I 


2 




44 




2 


6 


1 


45 


14 
23 


3 


8 


I 


46 


3 


12 


1 


47 


2« 


4 


fl 


I 


4M 


43 


4 


19 


1 


49 


So 




19 


I 




ST 


6 


15 




51 


74 


6 


9 


I 


52 


ei 


7 


13 


I 


5T 


H9 


7 


8 


1 


54 

55 


93 
96 


ft 
n 


5 
4 


Mean « 49.29 


5o 


9H 




3 




5T 


94 


Q 


0 


1 Standard 


5H 
59 


9^ 
90 


9 


0 
0 


I Deviation « 3.65 


&<> 


99 


9 


1 




6t 


99 


9 


2 





Arithmetic Applications 



SCORE 
33 
54 
35 
36 
37 
3H 
3<} 
40 
41 
42 
4) 
44 
45 
46 
47 
4R 
49 
50 
5J 
52 
5^ 
54 
55 
56 
57 
5R 



PERCENT ItE STAN INE FREQUENCY 



1 
I 
1 
I 
1 
I 
1 
I 
1 
? 

4 
T 
11 
18 
^T 
40 
51 
66 
T4 
Rl 
91 
96 
9T 
99 
W4 
99 



1 

0 
0 
0 
0 
0 
0 

1 

0 

I 

3 
3 
6 
10 
12 

la 

15 

20 
12 
9 
13 
7 
2 
2 
0 
2 



I 



Mean « 49.30 
Standard 

Deviation « 3.51 



I 

t««u«««« •••••••• ••••••• 

I 



ERIC 



N.H. Statewide Testing Program Evaluation - HID 



Table III-D-3 
Intercorrelatlons of Diutricf Means on Selected 
Stanford Achievement Tests in Standard Score Form 
Fall 1969 



Grade 4 





word 
Meaning 


Para . 
Meaning 


A r 

Computation 


i t h m e 
Concepts 


tit* 

C 1 c 

Applications 


Otis-Lennon 


Word Meaning 


1.00 












Para. Meaning 


.84 


1.00 










Computation 


.54 


.50 


1.00 








Concepts 


.69 


.71 


.67 


1.00 






Applications 


.66 


.69 


.67 


.80 


1.00 




Otis-Lennon 


.79 


.79 


.51 
Grade 6 


.76 


.73 


1.00 




Word - 
Meaning 


Para. 

Meaning 


A r 

Computation 


i t h m e 
Concepts 


tic 

Applications 


Otis-Lennon 


Word Meaning 


1.00 












Para. Meaning 


.84 


1.00 










Computation 


.37 


.38 


1.00 








Concepts 


.66 


.61 


.65 


1.00 






Applications 


.62 


.60 


.63 


.80 


1.00 




Otis-Lennon 


.62 


.63 


.34 


.54 


.52 


1.00 



ERIC 



.y. 



-31- 

35 



N.H. Statewide Testing Program Evaluation - HID 



\ 



Figure III-D-1 

A SAMPLE PROFILE OF NEW HAMPSHIRE SCHOOL DISTRICT MEANS 

Fall 1969 

Grade 4 



Standard 
Score 

60 - 



Stanford Achievement Test Otis 

Intennediate I Battery: Form X Lennon 

Word Para. Arith. Arith. Arith. Elem. II 

Meaning Meaning Comp. Concepts Appl. Form J 



Standard 
Score 

- 60 




35 - 



- 35 



ERIC 



-32- 

:s6 



H.H. Statewide Testing Prograa Evaluation - IV 



equal or lower ttandard score netn in measured 
mental ability." 

In Table III-D-3 the ?ntercorrelations be- 
tween community means expressed as standard scores 
are given. As in all such correlation plots the 
1.00 entries In the top left, lover right « first 
diagonal simply reflect the fact that the rela- * 
tionship between identical scores on a particular 
test would, of course, be 1, or perfect. 

These intercorrelation tables can eftectively 
be compared with the similar intercorrelations of 
the tests involved when pupil scores are used, not 
district means. These correlation coefficients 
tend to run somewhat lower because of a greater 
restriction in range. Perhaps the most signifi- 
cant line of figures on these tables is to be 
found in the bottom rov where the correlations are 
reported between Otis-iennon raw scores and each 
of the Stanford subtests included in this study. 

Generally these correlations vill be somewhat 
lower than the pupil correlations between the 
same pairs of tests, but the interesting fact is 
that these correlations are high as co^wred to 
many others reported in the table. 

Attention is slso called to the correlations 
between Arithmetic Concepts and the various other 
tests ill the battery which slso tend "^o run high, 
reflecting the fact that the Concepts test on 
Stanford is really a kind of indirect measure of 
mental ability. This is largely due to the fact 
that it reflects the knowledge oL general princi- 
ples taught as a part of the Mathematics program 
which are found to be very difficult by the less 
able pupils who have not reached a stage of matur- 
ity sufficient to permit them to work in terms of 
general principles. 

One must recall that these data actually are 
correlations between veans so that we are again 
enq>hatically reminded that a community or district 
has a character of it a own not uOflike the individ* 
ual characteristics of a child anid without any 
doubt thia character of the community and of the 
schools within the community puts its stamp on the 
quality and effectiveness of the educational pro- 
gram. To put this differently snd in more prag- 
matic terms, it just is not reasonable to expect 
a community which tends to nm substantially below 
the population as a whole, which in this case is 
the State of Hew Hampshire, is going to achieve 
results on a standardized test up to the norm on 
the test. Another important conconmitant; since we 
can infer a fairly close relationship between 
these test data and certain other sociological 
factors^ is that larger proportions of Title I chil- 
dren will be found in the communities where the 
school district averages tend to be on the low 
side. 

To emphasise Ijow coHmtnitias can differ %rith- 
out stressing the point unduly, a profile has been 
prepared using the linear sUndard scores on which 
are graphed the results for three communities se- 



ERIC 



lected by the *?riter as being representative of 
communities st the top, middlr and lower parts of 
the distribution of standard scores corresponding 
to school district means. It would be possible to 
dwell on these profiles at some lengtl^ but for the 
purposes of this report this probably is not nec- 
essary. It is interesting", however, to point out 
that the community which Is highest in terms of 
its average Otis-Lennon standard score is only do- 
ing sverage work in Arithmetic Computation. The 
writer leaves to the reader the task of making his 
own interpretation of the aignificance of this 
fact. (See Figure III-D-1 on page 32.) 



SECTION IV 



Description of the Title I Population 
From the and "OUT" Cards 



the "IW" and "OPT" Cards as Control Documents 

In Section IZ, delineating the changes made 
in the 1969-70 program, there is a discussion of 
the design and use of the "IH" and "ODT* cards. 

One of tltt major advantages of the "IIT* and 
"OUT** card procedure was the categorising of Title 
I projects submitted and approved by the local com- 
munities. The following series of tables, from 
IV -1 to IV -8, summarizes the data for th^ Title 
I population obtained from the **IM** and **OUr* 
cards* 

Table IV -1 

In Table IV -1, the distribution of cases by 
category is shown. This table includes all Title 
I cases for whom **IN" cards were available. It is 
evident from the Uble that the Title I prograa is 
concentrated in the lower grades. (Data for Grades 
3 » 5, 7 and 9 are missing from this report since 
these grades were not involved in the testing pro- 
gram. The dau are available, however.) 

The dau for Grades 2 and 4 indicate in ex-» 
cess of 800 cases in esch instance, whereas the 
number of enrollees in the program in Grades 6 and 
8 is between 400 and 500 cases, a substantial dro^v 

Most notable, also, is that the large major- 
ity of Title I children are involved in corrective 
and supportive reading programs. Even though the 
numbers of cases vary somewhat from grade to grade, 
the percentage of cases in corrective resding pro- 
grsms remains relatively constant, varying from a 
low of 64X in Grade 6 to a high of 81% in Grade 8. 
The percentage of the children in other projects 
listed according to our set of categories is so 
small that separate analyais is not justified. 
Hence, the only breakdowna uaed in this report will 
be for the total Title I group and, to the extent 
that it is possible, separate daU analysis for 
those in the reading programs. 

-33- y 



N.H. Statewide Testing Program Evaluation - IV 



part at the beginning of the school year, with 
only a scattering of children coining into the pro- 
gram throughout the year* Thus it is apparent 
that the decision as to who shall be included in 
the program must have been made prior to the close 
of school in the previous year* This is niso mer- 
itorious, assuming, of course, that the selection 
of these children has been made on the basis of 
adequate information. 

Table IV -6 

The duration of the Title I experience within 
the school year 1969-70 (Table IV -6) as indicat- 
ed on the "Oirr" card, varies greatly, although a 
majority of the children do stay in the program 
throughout the school year* Apparently some chil- 
dren are discharged from the program at various 
times, hopefully as their progress indicates that 
they are ready to move back into the regular 
stream of instruction* 

This is in itself a good idea, provided, of 
course, that there is objective evidence at the 
time the child is removed from the program that he 
has indeed satisfied the goals set up for him at 
the beginning of the year* 

The disadvantage of this is that if Fall and 
Spring testing is undertaken a substantial number 
of excused children, i.e., cases exiting before 
testing time, would not be tested in the Spring 
unless the school districts cooperate in carrying 
out the i:.struction that a_ll Title 1 children, 
whether or not they had been discharged from the 
program, should be so tested. There is evidence 
that this was not done* 

Table IV -7 



port does not utilize the U*S* Office of Education 
standard of "one year's progrc-s in school for one 
year of school attendance** r . a criterion because 
it is considered by this author to be totally un- 
reasonable. However, the experience of this writ- 
er in directing a corrective reading program for 
Piaellas County, Florida, where careful selection 
followed by detailed analysis was made, indicates 
that the percentage could be as high as 75 or 80Z 
under proper conditions of selection and analysis, 
a cprefully prescribed program of remediation, and 
effective corrective teaching. 

Table IV -8 

In Table IV -8 , the teachers had an oppor- 
tunity to indicate the level of progress made by 
each pupil in a Title I project,* The table gives 
the number and percent of responses to each choice 
separately for the reading group and the total 
tested group* About 40 to 45X of both 4th grade 
pupils and 6th grade pupils were indicated as hav- 
ing made excellent progress, while another U0%¥ 
were rated as having made modest but presumably 
significant progress. The percentage for whom 
only minor changes were evident or for whom no 
raal benefit was reaped by the program is consis- 
tently small, not exceeding 14Z of the total group 
in reading in Grade 4. 

One must interpret these data ss indicating a 
very optimistic oufook on the part of the teach- 
ers responsible for Title I instruction in light 
of the data comparing Title I pupils with the ran- 
dom sample tested Fall and Spring. Is it possible 
that expectation of progress is too low, so a 
sniall gein is credited too highly? One must draw 
his own conclusions after examining the data in 
this section* 



In Table IV -7 we have an analysis of the 
reasons why pupils were discharged from the Title 
I project. The most obvious reason is, of course, 
that the school year was terminated and the inclu- 
sion or exclusion of a student from the program 
during the subsequent 1970-71 year had not yet 
been determined. The number of pupils leaving 
school for one reason or another, including trans- 
fers, appears to be negligible, but the percent- 
ages of those discharged from the Title T project ^ 
because of satisfactory progress is fairly sub- 
stantial. Using again the N for the tested group, 
the percentage of cases in Grade 4 is 7% and in 
Grade 6, 101. In a sense, the number of individ- 
uals who are indicated as having maca satisfactory 
progress or, better yet, the percent of these in- 
dividuals as compared to the total group, is a 
measure of thwi success of the program if it ia 
really directed toward remediation in some area 
&uch as reading or mathematica. The percent dis- 
charged is not impressive from this point of view, 
especially in reading. Kipila in the 4th and 6th 
grade, if the selection has been done carefully 
and the diagnosis ia extensive, should respond to 
specisl help to the point where 50X or better 
would be able to be discharged from the program 
and sent back to their regular claaaea. Thia re«> 



ERiC 



N*H* Statewide Testing Program Evaluation - IV 



Table %y -l 

New Hampshire StatewifJe Testing Program 1969-70 
Number and Percent of Cases Enrolled 
in Each Project Category 
Title I Program 



Type of 


2 




6 
4 


R A 


D E 
6 




8 




Prolect 


No. 


X 


No. 


X 


No. 


% 


No. 


X 


1. Language 


31 


3 


17 


2 


11 


2 


16 


3 


2. Reading 


657 


68 


606 


74 


286 


64 


398 81 


3. Speech 


69 


7 


26 


3 


18 


4 


4 


•1 


4. Math 


6 


1 


55 


7 


42 


10 


2 


0 


5. Guidance 


79 


8 


56 


7 


27 


6 


19 


4 


6« Special 
Education 


3 


0 


2 


0 


3 


1 


8 


2 


7. Psychiatric 
Services 


14 


2 


0 


0 


0 


0 


0 


0 


8. Aides 


50 


5 


27 


3 


22 


5 


6 


1 


9. Cultural 
Enrichment 


16 


2 


25 


3 


5 


1 


25 


5 


10. Other 


35 


4 


10 


1 


n 


7 


16 


3 


Totals 


960 


100 824 


100 445 


100 494 


IQi 



Table IV-2 

New Hampshire Statewide Testing Program 1969-70 
Distribution of Total Number of Hours of 
Instruction for 1969-70 Title I Pupils 
Separately for Reading and All Projects Combined 





Grade 4 




Hours /Week 


Read ing 


All Cases 


1 


26 


74 


2 


104 


116 


3 


30 


37 


4 


69 


70 


5 


107 


117 


6 


0 


0 


7 


0 


0 


8 


0 


0 


9 


8 


8 


Full Time 


0 


6 






428 




Grade' 6 




Hours /Week 


Reading 


All Cases 


1 


13 


43 


2 


83 


86 


3 


31 


32 


4 


28 


28 


5 


18 


31 


6 


3 


4 


7 


0 


0 


8 


0 


1 


9 


0 


0 


Pull TiM 


0 


13 


Total Mo. of Cases 


176 


238 



ERIC 



N.H. Statewide Testing Program Evaluation - IV 



Table IV -3 



Table IV -A 



Number and Percent of Pupils 
By Type of Instructional Personnel Involved 
Title I - 1969^70 



Number and percent of 1969*70 Title I Pupils 
Who Were In Title I Projects.. 
In 1968*69 School Year 



Grade 4 



Grade 4 





Reading 


Total Group 


Instructor 


No> 




riO*^ 


At 

Reg. Classroom Teacher Only 


0 


0 


u u 


Outside Person or Agency 


0 


0 


7 0 
/ 


Special Teacher In: 








Langua ge 


0 


0 


D i 


Keaci ing 


337 


98 


J4 J OU 


C no o o k 

optsts Vil 


0 


0 


17 L 


Math 


0 


0 


10 2 


Guidance 


0 


0 


25 6 


Aide 


7 


2 


14 3 


Other 


0 


0 


6 1 


Total 


344. 




428 


Grade 










Reading 


Total Group 


Instructor 


No> 


J. 


No. ^ 


Reg. Classroom Teacher Only 


0 


0 


13 6 


Outside Person or Agency 


0 


0 


0 0 


Special Teacher In: 








Language 


0 


0 


3 1 


Reading 


166 


94 


170 72 


Speech 


0 


0 


9 4 


Math 


0 


0 


8 3 


Guidance 


0 


0 


20 9 


Aide 


10 


6 


11 5 


Other 


0 


0 


1 0 


Total 


176 




235 





Boy 


Girl 


Total* 




No. 


X 


No. 


% 


Ho. X 


Reading 










135 39 


Yes 


94 


27 


41 


12 


No 


102 


30 


95 


28 


198 58 


Don ' t Know 


8 


2 


1 


0 


9 3 




204 


59 


137 


40 


342 100 


All Cases 












Yes 


118 


28 


54 


13 


172 40 


No 


120 


28 


107 


25 


228 54 


Don't Know 


-2fiL 




6 


1 


26 6 




258 


61 


167 


39 


425 100 






Grade 


6 








Boy 


Girl 


Total 




No. 


X 


No. 


X 


No, X 


Read Ing 












Yes 


57 


33 


39 


23 


96 56 


No 


48 


28 


22 


13 


70 41 


Don't Know 


6 


3 


0 


0 


6 3 




111 


64 


61 


36 


172 lOO 


All Cases 












Yes 


83 


36 


52 


22 


135 58 


No 


53 


23 


31 


13 


64 36 


Don't Know 


9 


4 


5 


2 


14 6 




145 


63 


88 


37 


233 100 



*Total Includes Pupil Who Did Not Code Sex 



41 



«.H. Statewide Testing Program Evaluation - IV 



Table IV -5 

Entry Date for Children 
in the 1969-70 Title I Program (Tested Sample) 



Table IV-6 
DuT ;tion of Title I Experience 
For All Available Cases Tested 
in the Spring of 1970 



Grade 4 
Read iag 



All Cases 



Entry Month 


Boy Girl Total 


Bi'y Girl Total 


September 


137 


79 


216 


162 


92 


254 


October 


15 


8 


23 


23 


12 


35 


November 


n 
w 


2 


2 


0 


2 


2 


December 


13 


11 


24 


13 


12 


25 


January 


2 


3 


5 


2 


3 


5 


February 


2 


0 


2 


2 


0 


2 


March 


8 


10 


18 


8 


10 


18 


April 


1 


0 


1 


1 


0 


1 


May 


0 


0 


0 


0 


0 


0 


June 


2 


5 


7 


2 


5 


7 


Total 


180 


118 


298 


213 


136 


349 



Grade 6 



Reading 



All Cases 



Entry Month 


Boy Girl Total 


Boy Girl Total 


September 


75 


45 


120 


88 


54 142 


October 


2 


4 


6 


4 


8 12 


November 


6 


0 


6 


6 


1 7 


December 


9 


3 


12 


10 


3 13 


January 


3 


2 


5 


3 


2 5 


February 


1 


0 


1 


1 


0 1 


March 


1 


1 


2 


1 


1 2 


April 


0 


0 


0 


0 


0 0 


May 


0 


0 


0 


0 


0 0 


June 


0 


0 


0 


0 


0 0 


Total 


97 


55 


152 


113 


69 182 



Grade 4 



Reading 



All Cases 



D.^ation (Mos.) Boy Girl Total Boy Girl Total 



0 (No Out Cards) 

I 

2 

3 

4 

5 

6 

7- 

8 

9 

Total 



0 (No Out Cards) 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Total 



3 


5 


8 


3 


5 


8 


0 


0 


0 


0 


0 


0 


10 


2 


12 


10 


2 


12 


19 


19 


38 


20 


19 


39 


12 


7 


19 


12 


7 


19 


3 


4 


7 


3 


5 


8 


4 


2 


6 


7 


3 


10 


6 


2 


8 


11 


2 


13 


66 


48 


114 


69 


51 


120 


57 


29 


86 


78 


42 


120 


180 


118 


298 


213 


136 


349 


Grade 6 










Read ing 




All Cases 


Boy Girl Total 


Boy Girl Total 


0 


0 


0 


4 


4 


8 


0 


0 


0 


0 


1 


1 


11 


7 


18 


12 


7 


19 


11 


5 


16 


11 


6 


17 


6 


4 


10 


6 


4 


10 


2 


1 


3 


2 


1 


3 


2 


2 


4 


3 


2 


5 


6 


1 


7 


6 


1 


7 


4 


1 


5 


5 


4 


9 


55 


34 


89 


64 


39 


103 


97 


55 


152 


113 


"(^ 


182 



ERIC 



-38 



.42 



N.H. Statewide Testing Prog ran Evaluation * IV 



le IV-/ 

Reason for v^rminr.tion oi Participation 
la 1969-70 Title I Project 
Seri::'r3tely for Reading and Total Gtov^ 

Grade 4 





R 


E A D 


I N G 




Reason Boy 


Girl 




Total^ 


No. 




No. 


% 


Kd. 7. 


Satisfac. Progress 13 


5 


10 


3 


23 8 


T^ft School 0 


0 


1 


0 


1 0 


End of Sch. Year 163 


55 


102 


35 


256 90 


Other 4 


1 


3 




7 2 


Total 180 




116 " 


m 


iOOX 




TOTAL 


G R 0 


u . 


Boy 


Girl 




Total>'' 


No. 


% 


No. 


% 


No. X 


Satisfac. Progress 15 


A 


11 


3 


26 7 


Lett School 0 


0 


2 




2 1 


2nd of Sch. Year 190 


55 


lit 




309 89 


Other _fc 


2 


5 




11 3 


Total 211 


6Tz 






348 TOOX 




Grade 


5 








R 


E A D 


I N G 




Reason Boy 


Girl 




Total 


No. 


% 


No. 


% 


No. X 


Setisfac. Progress 8 


5 


7 


5 


15 10 


Left School 0 


0 


0 


0 


0 0 


End of Sch* Year 67 


58 


46 


30 


133 88 


Other __ 1 


1 


2 


1 


3 2 


Total 90" 


64% 


55 


36% 


151 lOOX 




TOT 


A L 


G R 0 


U P 


; oy 


Girl 


Total 


No. 


X 


No. 


X 


No. X 


Satisfac. Progress 9 


5 


9 


5 


18 10 


Left School 0 


0 


0 


0 


0 0 


End of Sch. Year 102 


56 


58 


32 


160 88 


Other 1 


1 


2 


1 


3 2 


Total 112 


627. 


69 


38X 


181 lOOX 



Table IV 

Teacher Judgment 

Concerning the Success of tht Program 

Title I 1969-70 Separately by Sex and 

Separately by Reading versus T tal Group 



Grade 4 



READING 



Success 



Boy 



Girl 



Total* 





No. 


X 


No. 


X 


No. 


X 


Excell. Progress 


79 


27 


53 


17 


133 


44 


Modest Progress 


72 


24 


53 


18 


125 


42 


Minor Change 


22 


7 


9 


3 


31 


10 


No Real Benefit 


8 


3 


2 


1 


10 


4 


Total 


181 


61% 


117 


39X 


299 lOOX 






T 0 


TAL 


G R 


0 U P 






Boy 


Girl 


Total* 




No. 


X 


No. 


X 


No. 


f* 


Excell. Progress 


94 


27 


62 


18 


157 


45 


Modest Progress 


87 


25 


62 


18 


149 


43 


Minor Change 


24 


7 


10 


3 


34 


10 


No Real Benefit 


8 


2 


2 


0 


10 


2 


Total 


213 


61X 


136 


39X 


350 


lOOX 



Grade 6 



R E A DING 



Success 



Girl 



Total 



*Totals Include Pupils Who Did Not Code Sex 





No. 


X 


No. 


X 


No. 


X 


Excell. Progress 


39 


26 


26 


17 


65 


44 


Modest Progress 


39 


26 


27 


18 


66 


44 


Minor Change 


13 


9 


1 


1 


14 


9 


No Real Benefit 


3 


2 


1 


1 


4 


3 


Total 


94 


63X 


55 


37X 


149 


lOOX 






T 0 


TAL 


G R 0 


U P 






Boy 


Girl 


Total 




No. 


X 


No. 


X 


No. 


X 


Excell. Progress 


47 


26 


36 


20 


83 


47 


Modest Progress 


47 


26 


30 


17 


77 


43 


Minor Change 


13 


7 


2 


1 


15 


8 


No Real Benefit 


3 


2 


1 


1 


4 


2 


Total 


110 


61X 


69 


39X 


179 


lOOX 



^Totals Include Pupils Wlio Did Hot Code Sex 



ERIC 



-3f- 

43 



N.H.. Statpwide Testing Program Evaluation - V 



SKCTION V 

Vhc Ri*ndoni Sample 

Ihl NEED for Rancon^ Sanple Testing Program 

The Stanford authors have provided no really 
meaningful way of comparing results in a test- 
retest situation over the relatively short period 
of time from Fall to Spring. 1,/ How then can one 
interpret such re-test data for special subgroups 
such as Title I? To provide for this situation, 
the New Hampshire 1969-70 statewide program under 
Title I auspices initiated the Spring re-testing 
of a random sample of casts selected from the en- 
tire population tested in the Fall to provide a 
state Spring ••norm" group at each grade level. 

The random sample was carefully drawn, using 
appropriate computerized statistical methods. Ap- 
proximately 1500 children per grade from those 
tested in the Pall were identified to be retested. 

Tests were provided by Title I for all of 
these children, but the nu.nber of cases actually 
tested or, at least for whom results were finally 
available for analysis, was typically less than 
50% of those selected at each grade level. 2^/ This 
raised a serious question as to the representative- 
ness of the TESTED random sample. However, when 
the paired cases from the random sample that were 
tested in the spring were finally available for 
both fall and spring and were compared with the 
total sample for fall testing, it was concluded 
that the differences, although some did exist, were 
aot of practical significance and chat the data for 
the partial random sample (now necessarily thought 
of as representative rather than random) provided a 
good guide as to the amount of gain to be expected 
by a cross-section group within this State over 
the seven month period between Fall and Spring 
testing, i.e., from October to May. Thus a much 
more realistic basis for evaluating Title I per- 
formance was made available than would otherwise 
have been the case. 

Determining the Representativeness of the Tested 
Random Sample 

The procedure for determining the representa- 
tiveness of the tested random sample for any test 
included making a distribution of the scores for 
the selected cases from the Fall test results and 
plotting this Fall sample distribution on an Otis 
Normal Percentile Chart on which the total state 
Fall distribution was also plotted. 

y They are not unique in this respect. 
Tests such as Stanford, California Achievement or 
Metropolitan etc. never were intended for retest- 
ing over short periods of time. A new technology 
is involved and test makers are hard at work on 
this problem. 

2J Some cases were lost who supposedly were tested 
because insufficient matching ID information was 
available. 



It would not be sufficient to make this com- 
paiison on the basis of means and standard dcvla- 
tions alone or of any other simple set o£ statis- 
tics that did not describe the entire distribution. 
The only satisfactory way of making this compari- 
son lu a manner that would be clearly understood 
by anyone reading this report was to compare the 
distributions graphically, at least for some of 
the tests involved. The Otis Normal Percentile 
Chart is normal probability paper prepared espe- 
cially for plotting distributions of test scores 
where it is desired to determine first, whether 
the distribution approximates a normal curve, and 
secondly, to compare a number of distributions 
which represent samples presumably comparable. 

This was done first for the Otis-Lennon Men- 
tal Ability Test deviation IQs (DIQ) and the re- 
sults for Grade 4 and Grade 6 are shown on the 
Normal Percantile Charts labeled Charts V-1 and 
V-2. Unit increments of one point are plotted and 
the line was drawn by connecting plotted points, 
thus allowing the greatest possible variation from 
one graphed line to rhe other. Smoothed lines are 
often suspect unless some statistically fairly 
precise way of smoothing is used or the person who 
does the graphs has had extensive experience at 
this task. In this situation, such refined smooth- 
ing seemed superfluous. 

In view of the relatively small number of 
cases in the tested random sample, one would natu- 
rally expect a less smooth curve for t^'^ sample 
than for the total populations of about 11,000 or 
12,000 cases tested in the fall. The general trend 
of the plotted line for the tested random sample, 
however, did reveal some slight systematic differ- 
ences between the total group and the small sample. 

The "tails" of the graphed distributions for 
both grades reproduced in this report have been 
curtailed for reasons of space. The total range 
of IQs for the state is shown in tabular form in 
another part of this report. 

Looking first at the plotted line for the to- 
tal sample, it will be seen that this line approx- 
imates a straight line for the major part of its 
length. The tested random sample also is reason- 
ably straight but shows a tendency to be somewhat 
above the total population, especially in the lower 
part of the curve, but approaches the state graph 
more closely at the top of the curve. Minor devi- 
ations must be considered to be of negligible im- 
portance for the purpose to be served here. 

Our conclusion from a study of the Grade 4 
chart is that the tested sample is slightly supe- 
rior in measured "brightness** to the fjtal popula- 
tion, especially in the lower range. A. the :nedian 
the difference amounts to abcut a point and a half, 
while at the 10th percentile this difference is 
perhaps two and a half points in terms of DIQ. 

The small number of cases available in the 
sample did not permit any deletion of cases to 
force the distribution for the tested random sam- 



ERIC 



N.H. Statewide Testing Program Evaluation - V 



pie in line wir\\ the total population tested, and 
therefore we tentatively concluded that the best 
procedure was to accept this sample as reasonably 
representative and study the amount of gain from 
Fall to Spring for this particular group as a basis 
for comparison of the gain made by the Title I 
children in this same grade, including all TltU I 
cases regardless of the type of project in which 
they might be found and, if possible In terms of 
available resources, for those in reading projects 
i>eparately . 

In Chart V -2 for Grade 6 the graphed line for 
the total sample looks very similar to that for 
Grade A, but the random sample ever more closely 
approximates the total group than in the case of 
Grade 4. Thus we can say that for both Grades A 
and 6 the measured brightness of the tested random 
sample certainly seems to be generally representa- 
tive of what might be expected had the entire state 
been retested with Otis-Len!.on in the spring. This 
is especially true because of the nature of the 
Otis-Lennon test and because of the conditions un- 
der which the se ole was selected* The tested ran- 
dom sample was / t even identified for Spring test- 
ing purposes until nearly time for the tests to be 
administered In May. 

Graphs similar to those shown were prepared 
for the five major Stanford subjects on which we 
are concentrating our attention In nv-klng^ compari- 
sons of Fall and Spring testing for b:)th vhe ran- 
dom sample and Title I. All charts contained a 
line for the entire state and for the tested ran- 
dom (representative) sample, both Fall and Spring* 
These tests are as follows: Word Knowledge, Para- 
graph Meaning, A'ithinetlc Computation, Arithmetic 
Concepts, Arithmetic Applications* None of the 
achievement test graphs can be shown, for reasons 
of econoTiy, but the selected statistics extracted 
from the graphs are presented elsewhere. In sum- 
mary, all such charts supported our belief that 
the random (now representat^^ o> sample fairly re- 
flected the statr performance in each comparison 
made . 

In conclusion, it might be helpful to ap- 
proach this evaluation of the random sample from 
a slightly different point of view. The tested 
random (representative) sample constitutes a group 
of children for whom there is no known reason to 
suspect systematic bias such as Hawthorne effect* 
The growth achieved by these children In the 
tested random (representative) sample over the 
seven month period Ir certainly one realistic 
touchstone as to the ainount of growth to be ex- 
pected dMring such a pe^'lod under conditions ex- 
isting in New Hampshirj* No better data exist as 
the basis for such crmparlsons. Statistical nice- 
ties may be lacking but coomon sense is not* If 
lack of cooperation, logistical support, and in- 
difference prevented a more: precise comparison 
group this Is Indeed unfortunate but not the fault 
of this writer or of the Vitle I staff. It does, 
however, highlight the need for a deeper under- 
standing of evaluative research on the part of all 
those who wish to know, really wish to know, 
whether their efforts are availing. 



In Table V*l selected percentiles are shown, 
as read from the available Normal Percentile 
Charts, for the total state and for the tested 
random sample. No statistical significance tests 
have been made for these data and none are sensi- 
ble under the circumstances* Values have been 
read to the nearest whole number since fractional 
values are of little use and suggest a precision 
not resident in the data* 

Table V-2 shows the means and standard devia- 
tions for raw scores on seKcted achievement tests, 
discussed in this report, administered in the fall 
of 1969. Casual Inspection is sufficient to es- 
tablish rather clearly that the random sample is 
comparable to the total population on these param- 
eters. This Is an instance where tests of statis- 
tical significance might be applied If it were not 
for the fact that the random sample ceased to be 
random when substantial nuw!bers of pupils failed 
to take the tests in the spring and, therefore, 
were excluded. 

If one wanted to indulge in an hypothetical 
exercise one could consider the total population 
tested in the fall as the universe, which would 
mean that the means and standard deviations re- 
ported are free of sampling error. It then would 
be possible to test the extent to which the repre- 
sentative sample with which we are left might be 
significantly different if the assunption of ran- 
domness was true* Since the writer can see no 
valid purpose for doing this, such statistical 
tests of significance have not been carried out. 

Neither Otis-Lennon nor composite prognostic 
scores have been included in this comparison be- 
cause of failure of the service centers to provide 
the necessary information. 



ERIC 



-41- 

45 



N.H. Statewide Testing Program Evaluation * V 




ERLC 



-42- 



46 



N.H. Statewide Testing Program Evaluation - V 



Otis- State- 
Lennon wide 
IQ 11,549 



Random 
Saniple 
C41 



Chart V-2 
NORMAL PER CENTILE CH ART 




ERIC 



, -At. 



4? 



N.H. Statewide Testing Program Evaluation - V 



Table V-1 

A Comparison In Terms of Raw Scores of the 
State-wide Populations In Grades 4 and 6 Tested 1p the Fall of 1969 
with Fall Results for the Random (Represent:! tlve) Sample 
Subsequently Selected for Spring Re-testing as a Control Group 









Grade 4 






Grade 6 




Test 


Xile 


Total 


Random 




Total 


Random 




Rank 


9 LB LtS 


Samp le 


Dili* 


State 


Sample 


Diff . 


Word 


75 


19 


21 


+2 


31 


32 


+1 


Meaning 


50 


14 


15 


+1 


24 


26 


+2 




25 


q 


in 


it 
+ 1 


18 


19 


+2 


Paragraph 


75 


29 


30 


+1 


40 


40 




Meaning 


50 


22 


23 


+1 


31 


33 


+2 




25 


16 


17 


+1 


23 


25 


+2 


Arithmetic 


75 


14 


14 


0 


17 


17 


0 


Computation 50 


11 


11 


0 


13 


13 


0 




25 


8 


8 


0 


9 


10 


+1 


Arithinetic 


75 


16 


16 


0 


16 


16 


0 


Concepts 


50 


12 


12 


0 


12 


13 " 


+1 




25 


8 


9 


+1 


9 


9 ' 


0 


Arithmetic 


75 


16 


16 


0 


22 


22 


0 


AppUc* 


50 


12 


12 


0 


16 


17 


+1 




25 


8 


9 


+1 


12 


12 


0 



Table V-2 

A Comparison of Means and Standard Deviations 
For the Random Sample and the Total Population 
Fall 1969 



No. of 

SAT;Int>I;X Items 

Word Meaning 38 

Para. Meaning 60 

Arith. Comp. 39 

Arich. Concepts 32 

Arlth. Appl. 33 



Grade 4 
RAW SCORE MEAN 



Random 


Total 


Sample 


Population 


15.9 


15.0 


24,4 


23.4 


11.5 


11.6 


12,9 


12.7 


12.9 


12.7 



STANDARD DEVIATION 



Random 


Total 


Sample 


Population 


7.1 


7.0 


9.4 


9.4 


4.5 


4.6 


5.2 


5.3 


5.1 


5.3 



Grade 6 



SAT tint. XI :X 

Word Meaning 
Para« Meaning 
Arlth. Co«p. 
Arlth. Concepts 
Arlth. Appl. 



No. of 
I tems 

48 
64 
39 
32 
39 



RAW SCORE MEAN 
Random 
Sample 



STANDARD DEVIATION 



25.7 
32.9 
13.9 
13.5 
17.6 



Total 


Random 


Total 


Popu la t ion 


Sample 


Population 


25.0 


8.5 


8.7 


32.2 


11.2 


11.5 


13.6 


5.5 


5.5 


13.3 


5.1 


5.3 


17.4 


6.8 


6.9 



ERIC 



48 

-44- 



N.U. Statewide Testing Program Evaluation VI 



SECTION VI 

The Tested Title I Population in New Hampshire 
Described and Compared with 
the Raadon ^Representative) Sample 

In this section we will present certain data 
describing the Title I population in Grades 4 and 
6 in comparison with the random (representative) 
sample for these same grades. Firsts however, we 
must note an exception which renders all compari- 
sons in this study somewhat moot as a description 
of the situation for Title I as a whole in this 
state. Notice the large discrepancy in cases 
within the Title I samples at Grade 4 and Grade 6. 
In Grade 4, 431 cases are included in comparison 
with 230 cases in Grade 6. These figures in nei- 
ther case represent the entire Title I population 
in these grades but the population of Title I 
cases in Grade 6 for whom we have complete test 
data Is a seriously biased population. Grade 4 is 
also biased but probably not as extensively since 
it probably represents a larger proportion of 
Title I cases in tttat grade. Certain significant 
and constant differences between the two grade 
populations seem to run through all the compari- 
sons made subsequently. Certain communities* 
especially some of the larger cities » are not in* 
eluded in the study because they failed to test 
cither fall or spring. These data simply are NOT 
representative of all New Hampshire Title I cases 
but, since tbey are a defined sample, the basic 
descriptive statistics relating to factors as 
significant as chronological age range » etc. » 
probably reflect the degiree of ^representative- 
ness of the Title I population as regards the 
total group reasonably well* 

The question must arise aa to how this situa- 
tion comes about. Recall that the basic law says 
that every school d^.strict is entitled to Title I 
funds in proportion to the number of families be- 
low the poverty level plus the number of families 
receiving aid*to-dependent-children» etc. It also 
requires that every project within Title I shall 
be evaluated » but it does not specify that this 
evaluation is to be dictated by the central office 
for Title I in the state, either as to the instru- 
ments used or as to the methods of analysis* 
Therefore, in this State, at least, (and probably 
in many others) any attempt to analyze data for 
Title I children obtained by voluntary cooperation, 
using prescribed instruments, is bound to be a 
biased description of what is going on. 

In New Hampshire, some of the larger cities 
chose to do their own evaluations, using tests 
which they determined, ufing methods of aailysis 
which they determined, and reporting only such 
data as they were directed to produce by the State 
Title I office. This consisted essentially of data 
describing the Title I populttion and did not in- 
clude individual test results* There was no pre- 
scribed constraint that they must describe their 
Title I population %rlth respect to their own total 
population. National nor?BS were considered suffi- 
cient and fssumed to be coaparable from test to 

ERLC 



test. No effort has been made, to this writer's 
knowledge, to summarize all such information re- 
ceived from these non-conf o'- .»ing communities. In 
other words, no one knows how well these various 
conuinities carried out their Title I project 
obligations as meaaured bjr a uniformly prescribed 
test . Indeed, there may be some shining gems of 
masterly accowplishment in achieving great success 
with their Title I children which are not reflect- 
ed in the data we are describing. 

With these limitations clearly in mind, let 
us take a look at the data we have for the tested 
sample . 

Chronological Age 

Perhaps one of the basic parameters that 
should always be examined is the characteristics 
of the population with respect to chronological 
age related to grade in a graded type of school 
organisation* In order to provide the necessary* 
comparative information, in the remainder of this 
report only the random (representative) sample 
will be compared with the Title I population since 
we only have data for both Fall and Spring for 
these groups* 

Note first the range of chronological ages 
for the random sample (Table VI-1 and Table VI-2* 
which is essentially identical with that for the 
entire group tested in the fall. The percent of 
boys versus girls varies slightly from Grade 4 to 
Grade 6. In Grade 4, 48% are boys and 52% are 
girls. In Grade 6, 53% ate boys and only 477. are 
girls. 

Compare now the similar information for the 
Title I children as tested. In Grade 4, 61X are 
boys and 39X are girls and in Grade 6, 62% are 
boys and 38X are girls* It is rather remarkable 
that the percentage of boys in Title I is so con^ 
stant from Grade 4 to Grade 6, but it is even more 
remarkable that these figures check very closely 
with many kinds of evidence as to the percent of 
boys who have difficulty of one sort or another in 
school as compared to the percent of girls. Time 
will not be taken to document this fact in this 
study but any reader wishing to do so can find many 
bits of evidence to indicate that it is boys who 
generally have reading problems; it is boys who 
generally get in trouble with the law and are ruled 
to be Juvenile delinquents; it is boys who gener- 
ally tend to have emotional difficulties of various 
kinds requiring their referral to some outside 
agencies* Furthermore, the proportion of boys to 
girls is more or less the same as reported here for 
chronological age* 

The median age of boys i.i the representative 
sample for Grade 4 is 9-6, for girls 9-5, with a 
median sg^ of 9-6 for the entire sample* For Title 
I the median age is 9 years 11 months for boys, 9 
years 8 months for girls and the median age for 
the total Title I ssmple is 9-10. Title I chiliM 
^n Grade 4 thus sverage four months older than the 
total population* For Grade 6 the comparable 

A5r 



N.H. Statewide Testing Program Evaluation - VI 



Table VI-1 

New Hampshire Statewide Testing Program - Fall 1969 
Distribution of Chronological Ages 
Separately by Sex and for th2 Total Group 
Random Sample and Title I 

Grade 4 



Age in 
Years & Months 



Random Sample 



14- 4 
13-10 
13- 4 
12-10 
12- 4 
11-10 
11- 4 
10-lC 



to 14- 
to 14- 
to 13- 
to 13- 
to 12- 
to 12- 



9 
3 
9 
3 
9 
3 

to 11- 9 
to 11- 

10- 

10- 

10- 

10- 

10- 
10- 



3 
9 
8 
7 
6 
5 
4 



0 
0 
1 
0 
0 

1 

3 
1 
2 
1 
1 
1 
6 
4 



Girls 

0 
0 
0 
0 
0 
2 
3 
4 
1 
1 
3 
3 
6 
1 



Total* 

0 
0 
1 
0 
0 
3 
6 
6 
3 
2 
4 
4 
12 
7 



Title I 


Boys 


Girls 


Tota 1* 




0 


1 




0 


0 




0 


1 




0 


0 




0 


1 




4 


5 


12 


1 


13 


26 


8 


34 


11 


4 


15 


5 


5 


10 


7 


2 


9 


5 


1 


6 


12 


6 


18 




2 


11 


12 


5 


17 


6 


6 


12 


8 


8 


16 


8 


6 


14 


11 


9 


20 


12 


3 


15 


7 


9 


16 


12 


9 


21 


12 


13 


25 


7 


4 


11 


7 


6 


13 


7 


8 


15 


13 


10 


23 


15 


10 


25 


12 


9 


21 


13 


6 


20 


5 


11 


16 


3 


1 


4 


0 


2 


2 


0 


1 


1 


261 


169 


431 


128 


98 


225 


171 


134 


313 


49.0 


58.0 


52.2 


65.5 


79.3 


72.6 


9-11 


9-8 


9-10 



10- 3 
10- 2 
10- 1 
10- 0 
9-11 
9-10 
9- 9 
9- 8 
9- 7 
9- 6 
9- 5 
9- 4 
9- 3 
9- 2 
9- 1 
9- 0 
8-11 
8-10 



8-^ 
7-10 



to 8- 9 
to 8- 3 



Total Ns 
N Mid -12 Months 
N Mid-18 Months 
Z Mid-12 Months 
1 Mid-18 Months 
Median Age 



3 
6 
7 
7 
10 

Ti 

19 
19 
25 
20 
18 
16 
14 
14 
15 
13 

iL 
13 



2 
6 
7 
3 

4 

26 
20 
21 
21 
19 
17 
18 
23 
22 
19 
19 
16 



0 
0 

268 
205 
247 
76.5 
92.2 
9-6 



0 
0 

295 
241 
271 
81.7 
91.9 
9-5 



5 
13 
15 
10 
14 
18 
45 
41 
48 
43 
37 
36 
34 
39 
39 
32 
38 
31 



0 
0 

586 
463 
538 
79.0 
91.8 
9-6 



^Includes Studeats Who Did Not Code Sex 

See Section III-A, p. 5 and the Metropolitan Manual fo Interpreting, p. 2, 9-11, 
and 23-24 for further discussion of the age-controller iample. 



ERLC 



SO 



II .H. Stattvidft TMtiog ftofxm Evaluation - VI 



Table VX'-S 
New Hampshire Statevlde Tea tins Program 
Diftribution of Otia-!>imott Deviation IQs 
Separately for Boys, Girls, and Total Group 
Teated Fall 1969 
VMM SAMPLE 



GRADE 



GUIS 



IQ 
Interval 

150 
147-149 
144*146 
141*143 
138*140 
135-137 
132*134 
129-131 
126-128 
123-125 
120-122 
117-119 
114-116 
111-113 
108 -110 
105-107 
102-104 

99-101 

96-98 

93-95 

9C-92 

87-89 

8446 

81-«3 

78-80 

75-77 

72-74 

69-71 

66-68 

63-65 

60-62 

57-59 

54-56 

51-53 

Totals 

Q3 XIU 75 
Q2 XiU 30 
Ql UU 25 



Cum. Cum. 
Ik>. Xaae Ho, taae 



1 
0 
0 
0 
2 
1 
1 
3 
4 
5 
7 

10 
14 
20 
21 
21 
23 
23 
19 
M 
U 
18 
16 
5 
7 
• 
4 
4 
0 
2 
0 
0 
1 



267 

110 
101 
89 



1M.77 
Staadav^ 0av.l4.93 



99 
99 
99 
99 
99 
99 
98 
98 
97 
95 
94 
91 
87 
82 
75 
67 
59 
50 
42 
34 
29 
24 
18 
U 
10 
7 
4 
3 
1 
1 
1 
1 
1 



1 
3 
4 

11 
11 
15 
20 
22 
26 
23 
28 
27 
24 
22 
20 
15 
8 
4 
6 
1 
2 
1 
1 



295 

112 
103 
94 

103.29 
12.39 



99 
99 
99 
97 
94 
90 
85 
78 
71 
62 
54 
44 
35 
27 
20 
13 
8 
5 
4 
2 
1 
1 
1 



TOIAL» 

Cum. 
go* Xaae 



BOYS 



GRADE 



GIRLS 



1 
0 
0 
0 
2 
1 
2 
7 
8 
16 
19 
26 
36 
43 
47 
48 
53 
50 
44 
40 
34 
34 
24 
10 
14 
9 
8 
5 
1 
2 
0 
0 
1 



595* 

m 

102 
92 



102 
13.76 



99 
99 
99 



Cum. Cum. 
Wo. laf^e Ko. lage 



TOTAL* 
Cum. 
Mo. Xage 



99 


1 


OO 


•» 


oo 


4 


99 


99 


A 
U 


oo 




QQ 

yy 


3 




99 






u 


Oft 

yo 


3 


99 


AA 

99 


J 


oo 


ii 
o 


QQ 


9 


98 


AA 

99 


c 

J 


Oft 




OA 

yo 


8 


97 


90 




90 




0*h 

yo 


5 


96 




Lj 




a 

v 


OA 


2% 


95 


94 


7 


91 


11 


91 


to 

10 


01 


90 


16 


89 


16 


88 


33 


88 


86 


27 


84 


17 


82 


46 


83 


80 


16 


76 


19 


76 


35 


76 


73 


life 


/ J. 


17 


70 


36 


71 


icy* 


91 






64 


54 


65 


56 


25 


59 


39 


55 


64 


57 


47 


26 


51 


20 


41 


48 


46 






43 


29 


35 


61 


39 


Ji 


20 


34 


23 


25 


44 


30 


24 


23 


28 


13 


17 


37 


23 


18 


21 


21 


11 


13 


34 


17 


13 


10 


14 


5 


9 


15 


12 


9 


6 


11 


4 


7 


10 


9 


7 


7 


9 


5 


6 


14 


8 


4 


4 


7 


6 


4 


10 


6 


3 


4 


6 


0 


2 


4 


4 


2 


4 


5 


3 


2 


7 


3 


1 


2 


3 


1 


1 


3 


2 


1 


4 


3 


3 


1 


7 


2 


1 


1 


2 






1 


1 


1 


1 


1 






1 


1 


1 


2 


1 






2 


1 




1 


1 






1 


1 


















322 




297 




641* 






112 




113 




113 






100 




103 




102 






91 




95 




93 






101.03 




103.85 




102.42 






15.76 




14.35 




15.11 





*nm Blstrikutions of DIQ* for Boys sad Girl* 4o aoc aua to eh* Toul Diacrlbuclon 
W«MM of tto fatlnr* of a ralweaacial mmbmt of pvpilo to cod. aw. 



ERIC 



-a- 
51 



N.H. Statewide Testing Program Evaluation - VI 



Table VI -4 
New Hampshire Statewide Testing Program 
Distribution of Otls-Lennon Deviation IQs 
Separately for Boys, Girls, and Total Group 
Tested Fall 1969 
TITLE I 



BOYS 



GRADE 



GIRLS 



TOTAL* 



BOYS 



GRADE 



GIRLS 







Cum. 




Cum. 




Cum* 




Cum. 




Cum. 


Interval 


No. 


Xage 


No. 




No> 


Zage 


No. 


Zage 


No. 


Za£e 


141-143 


















1 


99 


138-140 


















0 


99 


135-137 






1 


99 


1 


99 






0 


99 


132-134 


1 


99 


0 


99 


1 


99 






0 


99 


129-131 


0 


99 


0 


99 


0 


99 






0 


99 


126-128 


0 


99 


1 


99 


1 


99 


1 


99 


1 


99 


123-125 


0 


99 


0 


99 


0 


99 


0 


99 


0 


98 


120-122 


1 


99 


1 


99 


2 


99 


0 


99 


1 


98 


117-119 


0 


99 


1 


98 


1 


99 


1 


99 


3 


97 


114-116 


1 


99 


0 


98 


1 


99 


3 


99 


0 


93 


111-113 


4 


99 


2 


98 


6 


98 


2 


96 


1 


93 


108-110 


4 


97 


2 


96 


6 


97 


4 


95 


5 


92 


105-107 


4 


96 


4 


95 


8 


95 


1 


yl 


c 
3 


o/ 




10 


94 


5 


93 


15 


94 


9 


91 


5 


81 


99-101 


9 


90 


10 


90 


19 


90 


8 


85 


10 


75 




13 


87 


13 


84 


26 


86 


11 


79 


6 


64 


93-95 


26 


82 


17 


76 


43 


80 


20 


72 


8 


57 


90-92 


20 


72 


17 


66 


37 


69 


11 


57 


9 


A Q 


87-89 


28 


64 


19 


56 


48 


61 


14 


50 


12 


O Q 
JO 


84-86 


32 


53 


14 


45 


46 


50 


13 


40 


7 


25 


81-83 


23 


40 


19 


37 


42 


39 


6 


30 


3 


17 


78-80 


22 


32 


11 


25 


33 


29 


11 


26 


c 
0 


13 


75-77 


14 


23 


15 


19 


29 


21 


D 


1ft 


2 


7 


72-74 


14 


18 


7 


10 


21 


15 


is 


15 


1 


4 


69-71 


8 


12 


2 


6 


10 


10 


4 


9 


0 


3 


66-68 


9 


9 


3 


5 


12 


7 


1 


6 


2 


3 


63-65 


8 


5 


0 


3 


8 


4 


0 


6 


0 


1 


60-62 


1 


2 


3 


3 


4 


3 


3 


6 


1 


1 


57-59 


2 


2 


1 


1 


3 


2 


1 


4 






54-56 


2 


1 


1 


1 


3 


1 


2 


3 






51-53 


1 


1 






1 


1 


1 


1 






48-50 














1 


1 






Totals 


257 




169 




427* 


141 




89 




Q3 Zile 75 


93 




95 




94 




97 




101 




Q2 Xile 50 


85 




88 




86 




89 




92 




Ql Zile 25 


78 




80 




79 




80 




86 




Mean 


85.76 




87.95 




86.63 




88.65 




94.39 




Standard Dev. 12.37 




12.26 




12.35 




13.69 




13.16 





TOTAL* 
Cum. 
No. Zage 



1 
0 
0 
0 
0 
2 
0 
1 
4 
3 
3 
9 
6 

14 

18 

17 

28 

20 

26 

20 
9 

17 
7 
9 
4 
3 
0 
4 
1 
2 
1 
1 

230* 

98 
91 
83 

90.87 
13.75 



99 
99 
99 
99 
99 
99 
99 
99 
98 
97 
95 
94 
90 
87 
81 
73 
66 
54 
45 
34 
25 
21 
14 
11 
7 
5 
4 
4 
2 
2 
1 
1 



*More Title I pupils were tested in the fall than in the spring but only 
matched cases are included. 



H.H* Statewide Testing Prograsi Evaluation - VI 



figures in the randcai oample are 11-7 for boys, 
11-6 for girls and 11*6 for tne total group, while 
the figures for Tlt'o I at Grade 6 are 11-11 for 
boys. 11-8 for girls, \«ith a median age for the 
total Title I tested sr.cple of 11 years and 10 
aonthsv again a difference of four months. 

In these tables* the middle 18-aonth range 
has been set off from the rest of the distribution 
to represent children who are substantially at 
grade for age. This, in this writer's nomenclature 
is called the age controlled sample » This has been 
determined for the random sample as well as for the 
total fall distribution of chronological ages. In 
Grade 4 it includes children from 8 years and 10 
months to 10 years and 3 months, a net range of 
one year and six months. In Grade 6 it includes 
children 10 years and 10 months to 12 years and 3 
months. In other woirds» the ranges for Grades 4 
and 6 differ from each other by exactly two years 
at the terminal points. In the xrepresentative 
sample about 92X of all of the c«:«es fall irithin 
this age controlled range and only 8X of the rep- 
resentative sample are older than the upperaK>st 
bound of the 18-month range. In the Title I group, 
in Grade 4, 29t fall above tiM upper bound of the 
age cQntrolled sample for the state as a whole and 
only three cases, all girls » are younger than the 
youngest child in the age controlled group. This 
means that the percent of retardation in the Title 
I population Is very substantially larger than for 
the group as a whole, which is consistent, of 
course, with the finding that the median chrono- 
logical age is substantially higher for Title I 
children in Grade 4 than for the group as a whole. 

The statistics for Grade 6 are consistent. 
12X of the children in the population are older 
than the upper bound of the age controlled sample, 
while in Title I 29t are older. 

In conclusion then, we can describe the Title 
I population as generally being older; as having 
essentially the same spread of chronological ages 
as the total group but i#ith a much larger propor- 
tion in the upper or older age brackets than is 
true of the state. Since it is obvious that chil- 
dren above the upper bound of the age controlled 
sample must have been retarded at least one year, 
we can say that the percent of children acttially 
retarded in school for their grade placement is 
roughly 30X for both Grades 4 and 6. 

The Random tapresentative Sample versus Title 1 
Sample in Terms of Measured MencaTTblTltv ^ 

la Tables VI-3 and VI-4 descriptive informa- 
tion is given concemisg the measured mental abil- 
ity of the random sample and Title I, using the 
Otis-Lenaon Mratal Ability Test deviation IQ (DIQX 
Looking first at Table VI-3, which describes the 
random aample, we find that the distribution of 
DIQs in Grade 4 ranges from the 50*s to about ISO 
and that the madUn ]Q it 101 for boys, 103 for 
girls and 102 for the tested population. Tba same 
information for Gra4e 6 for the representative 
sample, shows medlamt of 100 for boys, 103 for 



girls, and 102 for a total. The distributions are 
essentially sysnetrical and nearly normal and cor-* 
respond very closely to comparable information for 
the total state presented elsewhere. Thus these 
dat- confirm our earlier conclusion that the New 
Hampshii-e population is at or slightly above the 
national norm sample on the Otis-Lennon. 

Now looking at the data for Title I (Table 
VI-4) we see a substantial contrast. The median 
IQ for boys in the tested Title I group in Grade 
4 is only 85, for girls 88, and for the total 
group 86. In Grade 6, the nedian is 89 for boys, 
92 for girls, with 91 as the median DIQ for the 
total Title I group at this grade level. 

All these tables provide Cumulative percent- 
ages so it is possible to tell by consulting any 
table what percent of children fall above any 
given OIQ level. For example , the cumulative per- 
cent s describing the Title I populstion in Grade 4 
show that 90X of the youngsters in this group have 
deviation IQs of 101 or lower or conversely only 
about lOX exceed that near normal median value for 
the states Recall that the typical value found 
for the entire State of New Banpshire was 102 at 
Grade 4. At Grade 6 the comparable figure is SIX 
baving DlQ's of 101 or lower with 19X having DlQ's 
higher than this. In other words, only 19% of the 
Title X children tested in Grade 6 had higher 
DIQ' 8 than the average level of brightness for the 
state as a whole. 

Mien all of these data are thoughtfully ex- 
amined one can reach the conclusion that the Title 
I group is definitely a selected group both with 
respect to chronological age, and also for DIQ. 
The Title I populations in both grades are defi- 
nitely over-age, a low- learning groups. Looking 
again at the cumulative percentages, we see that 
15X of the Grade 4 Title I children have DIQ's on 
the Otis-Lennon Test of 74 or lower, while at 
Grade 6, lit fall in this category. By contrast, 
for the tested random (representative) saiiq>le, 
only 3t of the children in Grade 4 have DIQ's of 
74 or lower, while at Grade 6 the percent is 4X. 

These data provide us with an opportunity to 
ask a very interesting question. Is it the intent 
of the Title I law to provide special help for 
slow- learning children in contrast to those who 
have corrective remedial defects in basic skills 
areas such as reading and math? I am quite sure 
the intent of the law is not clearly one or the 
other but the net result of the method of selec- 
tion in New Hampshire, at least, is the choice of 
a group of relatively slow-learning children who 
are over-age for their grade, for whom the omin 
task would appear to be to provide content of in- 
struction suitable to their level of mental de- 
velopment in a sequence and at a rate of presen- 
Ution suitable to their somewhat slower learning 
pace* Host significantly this says by definition 
almost, that it is unreasonable to expect devel- 
opment in the basic skilla areas at a rate com- 
menaurate with the normal rate, i.e., one year's 
growth for one yearns life eiq^rience in school. 

50- 



ERiC 



53 



N.H. Statewide Test lug Program Evaluation - VIIA 



The data argue strongly for instruction oriented 
to the needs and learning ability rate of each 
individual child in any Title I project In this 
State. In the subsequent sections > comparative 
data will be presented concerning what actually 
took place by way of learning within the skills 
areas as measured by the Stanford Achievonent Test 
over a period of seven calendar months roughly 
from October 15 to May 15. 

To try to escape these conclusions by arguing 
that Otis-Lennon does not neasure "learning poten* 
tial" or ^'learning ability" is only to quibble. 
The test content ia obviously not curriculum ori<- 
ented especially at Grade 4. The scores on the 
.est tend to correlate more highly with measures 
of achievement than any other test. Those iden* 
tified by the test as slow learners are so iden- 
tified by observation befoie testing. What the 
test does is to quantify this factor to permit re- 
lating it to other variables, - not perfectly per- 
haps let surprisingly well for its time limits and 
length in terms of number of items. 

The argument put forward by the authors that 
the test measures "6*' in the Spearman sense of the 
term is interesting but irrelevant to this dis- 
cussion. At this point the writer couldn't care 
less how the output of the teat is labelled. He 
cares very much that it does validly describe the 
pupils in the St&^.e of New Hampshire at these tw 
grade levels in a manner cousistent ulth his own 
20+ years of experience with this population and 
that it describes the tested Title I sample in a 
logically consistent manner. 1/ 

1/ These data are remarkably consistent with 
Statewide 8th grade information collected by the 
writer using the precursor of the Otis-Lennon, 
namely the Otis Quick Scoring: Gamma, Test of 
Mental Ability in 1963 and 1964. 



SECTION VII 

Single-Variable Comparisons of Pall-Spring 
Performance for the Random Sample and 
for Title I Cases 

Fart A 

Some Basic Measurement Problems 

This is certainly not the place to enter into 
a lengthy discussion as to the nature of measure- 
ment in education and psychology. It is generally 
accepted that every measure of every kind» eve;i 
those that appear on the surface to be quite pre- 
cise, does include an error factor which is af- 
fected by many influences that are essentially un- 
known and unknowable. Such things aa the quality 
of the test items in the sense of freedom from am- 
biguity, the length of the test, the applicability 
of the test to the local situation in terms of in- 
structional validity, the quality of the test ad- 
ministration, the general emotional environment in 
which the tests are administered along a continuum 
that might go from stresaful and emotionally up- 

ERIC 



setting to accepted and casu'-l, - all these things 
and many others affect the performanc of an indi- 
vidual on the day(s) he happens to take a test or 
a series of tests. 

The trsnsformation of the ra'./ score he earns, 
usually the number right > into a standard score 
does not in any way diminish or correct for these 
random error factors, although it may change the 
magnitude of the computed estimate of error be- 
cause the standard deviation is arbitrarily alter- 
ed. 

Note that theae errors of measurement (SE„) 
are present even when a single test is given and 
only one acore available. In fact the SEm is pri- 
marily useful to give the user an idea of how much 
dependence he can put on a given test score. These 
errors vary in magnitude from one subtest to an- 
other^thin a battery depending upon the standard 
deviation of the raw scores and, particularly, ths 
reliability of the test. 

Standard errors of measurement in terms of 
raw scores are definitely not comparable from 
test to test. 

When one test or teat battery is given, let 
us say in the Pall, and this same test or an al- 
ternative form is given in the Spring, the diffi- 
culty of estimating the random error or random 
variation in the differences between tests is 
compounded. The error of measurement for a single 
test isight actually operate in the case of one in- 
dividual to lower his score in the Pall whereas in 
the Spring it might enlarge his score, thus making 
it seem aa if he bad made an enormous gain over 
the period of time in question. The opposite 
might be true also so that an individual who, in 
the truest sense of the word, had made normal pro- 
gress throughout the period between tests might 
show up on the paired test scores as not having 
accomplished much of anything. The correlation 
between forms administered is also a factor here. 
If the sane form is given over again the effects 
of practice compound the problem « We could elabo- 
rate in further detail but this is better done in 
a different context. 

Prom the above it might be concluded that it 
is useless to test. Nothing could be farther from 
the truth. It is certainly very dangerous to use 
single paired comparisons (from Pall to Spring) 
for individuals aa being fully true and dependable 
measures of what has happened to a child over a 
period of time bet%feen tests snd within any one 
subject matter area. Only cumulative testing can 
do this. 

Wherein Does the Error of Measurement Reside? 

In this particular paragraph it is essential 
that we phrase our discussion in the form of a 
question to which no answer can ever be given in 
a completely definitive way. It is obvious from 
the discussion that has preceded this paragraph 
that the error of measurement may be in part due 




N.H. Statevide Testing Prograa Evaluation - Tl%A 



to Che Inscruaent itself. This kind of **error** is 
often estimeted by computing a reliability coeffi- 
cient for the instrument by correlating alternate 
halves of the test so ::^t one collection or se- 
quence of items constituting one-half the test is 
balanced by a second sequence of items as nesrly 
as possible measuring the save thing. This is the 
so-^called corrected split-half technique and» wizen 
the obtained correlations are corrected to allo%' 
for the full length of test» (Spearman-Brown Proph- 
ecy Formula) it does indeed give at least a rough 
estimste of the amount of stability charactsristic 
of the test itself since the effect of the per- 
formance of any child is substantially ruled out 
by the fact that alternative items usually are 
taken within seconds of each other. 

Other methods of computing the reliability of 
the test, however » call for the administration of 
the same form after an interval of time or the ad- 
ministration of two equivalent forms sequentially. 
In either case* there are additional difficulties 
involved. 

In both cases the general perception of the 
test by the person taking it now becomes a factor. 
On one day an individual may be feeling fine and 
at his peak level of performance while on another 
dsy quite the opposite may be true. Failure to 
understand the directions, illness, fatigue, dis- 
tractioe*» emotional upset, poor test adsdnistra- 
tion, fear, — all of these factors may enter in to 
cause one test result to be different from another: 
When one studies test results for a group of stu- 
dents tested at an interval of seven months , as in 
this repor' * the personal factors relating to the 
pupils tesced as con':rasted to the fa^rtors embod- 
ied in the InstTMnt itself may be overwhelmingly 
iaportant especially if the pupil has had ^a bad 
year^. Indeed » there is a strong possibiHty that 
widely varying test scores for an individual es« 
pscially over a short time span msy be an inculca- 
tion of the emotional instability of that indivi- 
dual* especially when one test follows closely on 
the heels of the other. 

The correlations between two tests admini- 
stered sevsn months apart will always be lower 
than the correlation coefficient between paired 
scores on tLie test taken only days apart. One of 
several very important factors involved here is 
the amount of learning and forgetting that has 
taken place between the two test adadnistratiops, 
which alone would account for much of the differ- 
ence in the results obtainad from one time to fhe 
other e.g. fall to spring. This is not chance er- 
ror subject to estimation by any formula* It ^s 
better known as ''bias'*, i.e., the known and ap* 
praiaable effect of what would t^nerally be des- 
cribed as t^ experimantal factors plus '^statie*'. 

fiBXiMps this pan best be seen if one eons id* 
ers a simple task* such as answering a single item, 
where the unreliability of the test question may 
be considered to be effectively minimised by the 
varv nature of the item. 

^* . ^ 

ERiC 



Consider the addition of three specified two* 
place nuaters as the item in question using a free 
response mode of response. The answer will be 
right if the individual knows (1) the proc^:c are 
for adding three t%K)-place numbers, (2) knows his 
100 addition facts, and (3) has the ability to re- 
tain in mind the partial aum of the first two num- 
bers in the first column at the right while he 
adds to this partial sum the third number in the 
column, and, (4) if the aum of the first column is 
more than 9, ability to carry the remainder to the 
second coluan is also involved. 

In other words, even in this apparently aim- 
pie task of adding three two-place nunbera there 
is a level of complexity that ia not at all obvi- 
ous on the surface. Any failure to remember a 
combination of two numbers, or any forgetting of 
the partial sum in the process of addition, etc., 
etc., results in an error that makes the final an- 
swer wrong. There are no partial credits . The 
second time the test is taken, especially if there 
has been drill and supplementary instruction in 
the task involved, the individual may hsve a better 
chsnce of answering the item correctly. On the 
other hand, if the item had been thought tc have 
been effectively taught at the time it was first 
tested (according to the teacher's Judgment) but 
was not touched on in the interim period, i.e., 
there was no maintenance of skills, forgetting 
would be a significant factor causing some indi- 
viduals to have a higher potency for error the 
second time thfn the first. 

It should be obvious from this illustration 
t^t it is not the test item that is at fault but 
it is something that actually happens to the indi- 
vidual child. This is bias,->-not random error 
even chough it does result in a lower correlation 
coefficient between Fall and Spring tests than be- 
tween tests readminis tared within days. 

Any test is, perforce, made up of a fairly 
large number of items and the test as a whole may 
or sMy not be homogeneous in the statistical sense 
o£ the word. Even an arithsMtic coi^>utation test 
cannot truly be considered to be homogeneous be- 
cause of the multiplicity of learnings involved. 
Even a test limited to addition alone cannot nec- 
essarily be considered to be entirely homogeneous. 
The one really hoongeneous test in arithmetic (and 
this even adght be challenged) would be a test ir- 
. volving knowledge of the hundred addition facts or 
the hundred subtraction facts. 

When one moves into the field of reading it 
is obvious that the possibilities for errors in* 
crease since the content of the reading tests them- 
selves is not msterial taken literally out of the 
body of instructional material, but represents a 
novel body of content on which the individual ex- 
ercises his developed skill in reading materials 
suitable for his level of development whatever 
they may be. Kven repeating the same paragraphs 
after a period of aeven wntha, as was done in the 
Vew Himp shire situation, is no guarantee that the 
result on the second test is effectively reflecting 

J5 



N.H. Statewide Testing Progran Evaluation - VIIB 

the outcome of instruction that has taken place 
between tha first test and the second test» es- 
pecially if the test material is (or was) not par- 
ticularly well suited to the needs of the child on 
one or the other admiu^stratlon. Lack of local 
validity in the choice of paragraphs could seri- 
ously affect the score distribution. This cannot 
be charged off as error of measurement. Sooeone 
goofed! In a really good corrective program the 
correlation between firsc and second tes^ may be 
lowered by the very effectiveness of the 
tive instruction. A child having no good method 
of word attack on novel words initially but who 
improves great? in this skill under instruction 
may make enon&.as as on retest over a substan* 
tial period of t. . 

All of the above discussion boils down to one 
simple fact. When tests, wide b^ fallible huun. 
beings* are administered b^ fallible human beings 
to fallible human beln^^s chance variation ac^ bias 
oust be expected. ^ '*^^t8 are short; the tests 
are imperfect as ^cd' by their reliability 

coefficients and * wueir correlations with valid 
criterion measures; but most important, the indi* 
viduala taking the testn are variable in "heir 
performance fros ^<.y to day to say nothing of 
their performance over a period of seven months. 

Under these circumstances it only makes sense 
(1) to restrict one's broad interpretation of such 
data to general trends and (2) to identify the in* 
dividuals who show extremely atypical performance 
over a period of time as the ones mf'^t likely to 
need special attention. In identifying these ex- 
treme cases, it does indeed make sense to keep in 
mind the standard error of the difference betwesn 
scores since it then gives one confidence that an 
observed difference for an individual that is sev- . 
eral times the standard error of the difference is 
most probably the ^ne where some extraneous influ- 
ences can be detected and analysed. 

SECrXxOH VII 
Part B 

Comparisons in Raw Scores and 
Corresponding Grade Equivalents for the 
Random Sample and Title I Cases 

In the previous pages ve have discussed at 
length the hazards involved in making comparisons 
over a short period of time and the contribution 
of error measurement and bias in explaining dif- 
ferences that do occur. 

We aT now ready to look at the actual data 
for fall testing (October 1969) and spring testing 
(Kay 1970) for the random (representative) sample 
an/l for Title I cases. 

In Section V we have already discussed the 
need for a random sample that wuld be represent- 
ative of the state as a whole and ve have docu- 
mented the fact that this MMplt» selected care- 
fully by sound statistical matbods»wms not in fact 
a random sample whan it was actually used becauae 



of the failure of communities to test in accota- 
ance with the specifications. We have also estab- 
lished the fact that the tes» random sample 
turned out to be quite representative jf the state* 
wide tested population. Therefore we feel that we 
are now ready to make actual score comparisons be- 
tween fflll and spring which will be valid. 

In the first two Ubles presenting the data 
for fall and spring testing we have recorded the 
percentiles corresponding to the selected percen- 
tile ranks 75-50-25. We are not going to approach 
the interpretation of these tables so much from a 
statistical point of view as from a common sense 
point of view. 

Ic must be remeiabered in this context that 
this may be the first time that this test-retest 
procedure involving a control sample t a been em- 
ployed in an operating situation; i.e., in a situ- 
ation which was not part of an experimental re- 
search program. XJ The intent was to find and test 
a aample which wuld be representative of our 
state and would therefore reveal what the goal 
should be for typical New Hanpahire children in 
terms of gains on selected subtests oi Toe Stan- 
ford Achievement Test. With this infjA-ination a- 
vailable, obviously sounder judgments could be 
made about the performance of our Title I children. 

This report has dealt extensively with the 
disadvantages and the inadequacies of grade equiv- 
alents as they are presently obtained and inter- 
preted for measuring gain: It is for this reason 
that the gains have been listed first in raw score 
form, making poaaible a variety of interpretative 
procedures. The scores also have been Interpreted 
In terms of the grade equlvalenta In order to make 
the pattern of interpretation fall Into aomething 
comparable to the expected or traditional analyaia 
aa apecifiad by the U.S. Office of Education Title 
I ataff. 

The median raw score gains have been circled 
In Tablea VII-B-1 and VII-B~2 in order to uske 
them atand apart from the other percentiles and to 
aimplify the Interpretation of theae tables. Aa 
we look at the median of each teat for the random 
aaaplea in Gradea 4 and 6« the first reaction la 
one of aoma conatematlon that the gains are as 
amall aa they are. 

In theae tablea, aa In aoma other tablea In 
thia report, the number of Itema in each teat la 
given in parentheses in the margin alongalde the 
teat namaa. In making thia evaluation, it la first 
neceaaary to make a value JudgiMnt or an aaaumption 
concerning the auitabllity of the teat for the lo- 
cal curriculum. Keep in mind that the teat items 
alao have baen arranfad ganermlly in order of in- 



1/ Much credit muat go to Mr. Richard Hodges, now 
State Director for Title I, for auggeatlng the 
procedure et the ataff evaluation aeaajLon fol-* 
lowing the 196S-*69 teating program and Imple- 
meatlAg the procedure for the 1969-70 program. 



N.H. Statewide Teeting Program Evaluation - VIIB 



Table VII-B-1 

NEW HAMPSHIRE STATEWIDE TESTING PROGRAM 1969-70 
PERCENTILES CORRESPONDING TO SELECTED PERCENTILE RANKS 
WITH CORRESPONDING GRADE EQUIVALENTS FROM STANFORD GRADE NORMS 
AND FROM TABLE OF EQUIVALENT METROPOLITAN NORMS 

Raiidow Saaple 

Grade 4 



Test 



Stanford Intermediate I 
Xile Raw Score Grade Equiv. 

Rank Fall Spring Gain Fall Spring Gain Dev,* 



Comparable Metropolitan 

Grade Equiv. 

Fall Spring Gain Dev.* 



(38) Word 


75 


21 


27 




4.9 


5.9 


1.0 


+.3 


5.2 


6.4 


1.2 


+.5 


Meaning 


50 


15 


22 




3.9 


5.1 


1.2 


+.5 


4.2 


5.4 


1.2 


+.5 




25 


10 


16 


6 


3.3 


4.1 


.8 


+.1 


3.6 


4.4 


.8 


+.1 


(60) Paragraph 


75 


30 


40 




4.6 


5.9 


1.3 


+.6 


5.3 


6.7 


1.4 


+.7 


Meaning 


50 


23 


31 




3.8 


4.7 


.9 


+.2 


4.4 


5.4 


1.0 


+.3 




25 


17 


24 


7 


3.0 


3.9 


.9 


+.2 


3.3 


4.5 


1.2 


+.5 


(39) Arithmetic 


75 


14 


23 




4.0 


5.2 


1.2 


+.5 


4.6 


6.0 


1.4 


+.7 


Comp. 


50 


11 


18 


6 


3.6 


4.5 


.9 


+.2 


4.1 


5.2 


1.1 


+.4 




25 


8 


13 


5 


3.1 


3.8 


.7 


0 


3.5 


4.3 


.3 


+.1 


(32) Arithmetic 


75 


16 


20 




4.8 


5.5 


.7 


0 


5.2 


5.9 


.7 


0 


Concepts 


50 


12 


16 




4.1 


4.8 


.7 


0 


4.4 


5.2 


.8 


+.1 




25 


9 


11 


2 


3.3 


3.9 


.6 


-.1 


3.5 


4.2 


.7 


0 


(32) Arithmetic 


75 


16 


21 




4.6 


5.5 


.9 


+.2 


5.0 


6.1 


.5 


-.2 


Appl. 


50 


12 


16 




4.0 


4.6 


.6 


-.1 


4.2 


5.0 


.8 


+.1 




25 


9 


11 


2 


3.6 


3.9 


.3 


-.4 


3.8 


4.1 


.3 


-.4 












Grade 6 















Stanford Intermediate II 



Comparable Metropolitan 



(48) Wbrd 


75 


32 


36 i 




8.0 


.7 


0 


8.1 


9.1 


1.0 


+.3 


Heaning 


50 


26 


31 Q 




7.1 


.9 


+.2 


6.8 


7.8 


1.0 


+.3 




25 


19 


25 i 




6.0 


.9 


+.2 


5.5 


6.5 


1.0 


+.3 


(64) paragraph 


75 


40 




L 6.7 


7.8 


1.1 


+.4 7.6 


9.2 


1.6 


+.9 


Meaning 


50 


33 


39 Q 


Q 5.9 


6.6 


.7 


0 


6.7 


7.5 


.8 


+.1 




23 


25 


29 i 


r 4.8 


5.3 


.5 


-.2 


5.5 


6.0 


.5 


-.2 


(39) Arithna^ic 


75 


17 




L 5.9 


6.8 


.9 


+.2 


7.2 


8.2 


1.0 


+.3 


Coap. 


50 


13 


" d 


i) 5.2 


i.9 


.7 


0 


6.0 


7.2 


1.2 


+.5 




25 


10 


12 : 


I 4.6 


5.0 


.4 


-.3 


5.4 


5.8 


.4 


-.3 


(32) Arithmel;lc 


75 


16 




L 6.5 


7.6 


.9 


+.2 


7.1 


8.5 


1.4 


+.7 


Concepts 


50 


13 


« c 


p 5.9 


6.5 


.6 


-.1 


6.4 


7.1 


.7 


0 


25 


9 


11 : 


I 4.9 


5.4 . 


.5 


-.2 


5.3 


5.3 


.5 


-.2 


(39) Arithmetic 


75 


22 


25 ' 


L 6.6 


7.4 


.8 


+.1 


7.4 


8.2 


.8 


+.1 


Appl. 


50 


17 


19 Q 


Q 5.7 


6.1 


.4 


-.3 


6.4 


6.9 


.5 


-.2 




25 


12 


13 


r 4.6 


4.9 


.3 


-.4 


5.0 


5.4 


.4 


-.3 



* Reprasenta the deviation from the expected gain of .7 of a calendar year, 
often ina<:curately designated 7 months of a school year. 



-54- 



ERIC 



57 



N.H. Statewide Testing Program Evaluation * VIIB 



Table VlI-B-2 

NEW HAMPSHIRE STATEWIDE TESTING PROGRAM 1969-70 
PERCENTILES CORRESPONDING TO SELECTED PERCENTILE RANKS 
WITH CORRESPONDING GRADE EQUIVALENTS FROM STANFORD GRADE NORMS 
AND FR»l TABLE OF EQUIVALENT METROPOLITAN NORMS 

Title I 

Grade 4 



Test 

(38) Word 

Meaning 



(60) Paragraph 
Meaning 



Stanford Intermediate I 
Xile Raw Score Grade Equiv. 

Rank Pall Sprln- '^ain Fall Sprang Gain Dev.* 



Conp. 



Cone. 



Appl. 



Comparable Metropolitan 

Grade Equiv. 

Fall Spring Gain Dev.* 



75 


11 


16 




3.5 


4.1 


.6 


-.1 


3.8 


4.4 


.6 


-.1 


50 


8 


12 




3.1 


3.6 


.5 


-.2 


3.4 


3.9 


.5 


-.2 


25 


5 


8 


3 


2.7 


3.1 


.3 


-.4 


3.0 


3.4 


.4 


-.3 


75 


19 


25 


6 


3.2 


4.0 


.8 


+.1 


3.6 


4.6 


1.0 


+ .3 


50 


15 


19 


(D 


2.8 


3.2 


.4 


-.3 


3.1 


3.6 


.5 


-.2 


25 


12 


15 


3 


2.5 


2.8 


.3 


-.4 


2.7 


3.1 


.4 


-.3 


75 


12 


18 




3.7 


4.5 


.8 


+.1 


4.2 


5.2 


1.0 


+.3 


50 


9 


13 


<i> 


3.3 


3.8 


.5 


-.2 


3.7 


4.3 


.6 


-.1 


25 


6 


10 


4 


2.V 


3.5 


.8 


+.1 


3.0 


3.9 


.9 


+.2 


75 


11 


14 


3 


3.9 


4.5 


.6 


-.1 


4.2 


4.8 


.6 


-.1 


50 


8 


10 




3.0 


3.6 


.6 


-.1 


3.2 


3.9 


.7 


.0 


25 


6 


8 


2 


2.5 


3.0 


.5 


-.2 


2.7 


3.2 


.5 


-.2 


75 


11 


14 




3.9 


4.2 


.2 


-.5 


4.1 


4.5 


.4 


-.3 


50 


8 


10 


d) 


3.4 


3.8 


.4 


-.3 


3.6 


4.0 


.4 


-.3 


25 


5 


6 


1 


2.9 


3.0 


.1 


-.6 


3.1 


3.2 


.1 


-.6 










Grade 6 















Stanford Intermediate II 



Comparable Metropolitan 



(48) Word 

Meaning 



(64) Paragraph 
Meaning 



(39) Arithmetic 
Comp. 



(32) 



Arithmetic 
Concepts 



(39) Aritr 
Appl. 



Stic 



75 


21 


27 


6 


5.4 


6.4 


1.0 


+.3 


5.8 


7.0 


1.2 


+ .5 


50 


16 


21 




4.6 


5.4 


.8 


+.1 


4.9 


5.8 


.9 


+.2 


25 


12 


16 


4 


3.9 


4.6 


.7 


.0 


4.2 


4.9 


.7 


.0 


75 


27 


34 


7 


5.0 


6.0 


1.0 


+.3 


5.7 


6.8 


1.1 


+.4 


50 


20 


26 


CD 


4.2 


4.9 


.7 


.0 


4.8 


5.6 


.8 


+.1 


25 


17 


20 


3 


3.8 


4.2 


.4 


-.3 


4.4 


4.8 


.4 


-.3 


7i 


14 


17 




5.4 


5.9 


.5 


-.2 


6.3 


7.2 


.9 


+.2 


50 


10 


12 




4.6 


5.0 


.4 


-.3 


5.4 


5.8 


.4 


-.3 


25 


7 


9 


2 


3.8 


4.4 


.6 


-.1 


4.3 


5.1 


.8 


+.1 


75 


13 


16 


3 


5.9 


6.5 


.6 


-.1 


6.4 


7.1 


.7 


.0 


50 


9 


11 




4.9 


5.4 


.5 


-.2 


5.3 


5.8 


.5 


-.2 


25 


7 


8 


1 


4.3 


4.6 


.3 


-.4 


4.6 


4.9 


.3 


-.4 


75 


16 


Id 




5.6 


5.9 


.3 


-.4 


6.3 


6.7 


.4 


-.3 


50 


12 


13 


6 


4.6 


4.9 


.3 


-.4 


5.0 


5.4 


.4 


-.3 


25 


9 


10 


1 


4.0 


4.2 


.2 


-.5 


4.2 


4.5 


.3 


-.4 



Represents the Deviation from the Expected Gain of .7 of a calendar year» often 
inaccurately designated 7 months of a school yaar. 



ERIC 



-55- 

58 



N.U« Statewide Testing Program Evaluation - VIIB 



creasing difficulty, or In cycles of Increasing 
difficulty vlthln che subdivisions of the subtest 
content* Arranging the Itens In order of dlffl* 
culty effectively counteracts any claim that the 
tests ibay have been too highly speeded* A less 
able child vlll do all he can do In the time al« 
loved because the probability Is great that the 
Items beyond the point where he stops, assuming of 
course that he understands the directions, etc., 
will be too hard for him to answer correctly. 

The evidence Is clear from the score distribu- 
tions that surprisingly few children guess wildly. 
For example, many times the median score of a dis- 
tribution Is below the chance level* If we are 
satisfied on these points. It leaves us with the 
neces5lty of asking If the number of points of 
gain shown In the first section of th^ Ubles Is 
reasomible In terms of the nuai>er of Items In each 
test* 

We must assume that the Stanford Intermediate 
I test contained material substantially appropriate 
for uf at the beginning of Grade 4. In Grade 4, 
the sversge scores earned In the arithmetic tests 
are on the low side in comparison with the number 
of Items, but so Is the performance of New Hamp- 
shire children according to Stanford norms* 

In Grsde 6, Intermediate n Battery, the New 
Hampshire median scores also tend to' fall ratbmr 
substantially below one-half the number of Items 
In each of the three arithmetic tests* Before mak- 
ing any critical judgment at this point. It must 
be remembered that these batteries are Intended to 
be suitable for two grades; namely, 4 and 5 for 
the Intermediate I, and 6 and 7 for the Intermedi- 
ate II. Therefore, It Is only right and proper 
that the number of Items answered correctly should 
be somewhat less than half of the toUl ttuml>er of 
Items In the test In the lower of the two grades 
at each level. 

More Important than the median score at the 
beginning of the year Is the amount of gain over 
the seven months between tests* Spring medians do 
go up sppreclsblyybut do they go up enough? The 
amount of gain Is more or less dependent upon the 
extent to which the content of the test Is very 
specific to the Instruction t^iklng place during 
the period of time between first and second test- 
ing. Only an Item-by-ltem subjective analysis of 
the test content by competent curriculum special- 
ists will reveal to what extent the test Items do 
measure the content of Instruction* 

In Table VII-i-3, the Stanford raw .scores 
corresponding to a grade equivalent of 4.1 (October 
15) and 4.8 Otoy 15) are ubled as nearly as thase 
can be determined from the published norm tables 
for tranalatlng raw scores to grade equivalents *jL/ 
Some fractional values have been given In this ^ 
table because there were no precise correspcmdlng 
scores given for 4.1 or 4.8 In the ubles* 

An examlnatlom of this ubla Is vary enlight- 
ening* Four plus points of gain are expected, ac* 



cording to the norms. In Word Meaning and about 
six In Biragraph Meaning at the Intermediate I 
level; also a five point score gain In Arithmetic 
Computation Is stipulated* However, the expected 
gain for Concepts and Applications drops to four 
plus points. Note that these are the gains ex- 
pected for the stlvulsted seven months, which Is 
really .7 of a calendar year. 2/ 

In the Intermediate II Battery, ver aearly 
comparable values are expected to resul' rotn 
seven months of In-school Instruction bet een Octo- 
ber and May, 

These are not large gains* One would be much 
happier to have them at least half again as large. 
B^wever^galn In acore Is not solely within the con- 
trol of the test maker or publisher. Decreased 
emphasis on '*book learning" with Increased compe- 
tition from other organized activities In school 
may be partly cauaatlve. 

With these data In mlnd» let us go back to 
Table VII-B-1 and look to see what the students In 
Mew Hsmpshlre did over the ssAe length of time. In 
the seven months from the middle of October to the 
middle of May, the median for New Hampshire chil- 
dren In the random sample for Grade 4 reached or 
exceeded the amount of raw score and grade equiva- 
lent gains expected according to Stanford norms In 
all tests except Arithmetic Applications, where 
there was a .1 year deficit idilch, in part, Is a 
smoothing effect* 

In Grade 6 on the Intermediate II Battery, 
Hew Hampshire children In the random sample gained 
*2 year more than expected In Word Meaning, made 
the eiqiected gain In Faragraph Meaning, and In 
Arithmetic Coaqputatlon, had a minus *l yesr devl* 
ation from the norm In Concept^ and were *3 year 
behind the expected gain In Arithmetic Applica- 
tions* 

Comparing Title 1 Gains with Expected Gains 

It Is alwmys a problem to know what to expect 
of a group demonstrated ahead of time to be a less 
able group In terms of mental ability and known to 
come from the dlsadvanUged strata within the 
state* Although the generalization Is somewhat 
dangerous, an examination of Table VII*B-2 com- 
pared to Table VIIHI-l suggests that the 75th per- 
centile of the Title I children Is not too fsr 
from the median value for the state as a whole* 



1/ There Is a question as to the appropriate norm 
to use because the teats were not all adsdnla- 
tered within the apoclfled time limits In either 
fall or spring. 4*1 (or 6*1) versus 4.8 (or 6.8) 
aaesM suitable for cosiparlaon purpoaea. 

2/ Differences between medians of successive grades 
are taken to represent the gain e3q>ected In a 
achool yaar but are actually repreaentatlve of a 
calandar year* 



ERLC 



N*U. Statewide Testing Program Evaluation - VIIB 



Table VII-B-3 

E3q)ected Change In Selected Stanford Subtest Scores 
Over Seven Months of In-School Instriictlon 



INTERMEDIATE I: GRADE 4 



SEifeas . 




Raw Score Norm fori/ 


Expected 


as Reported 


Test 


October 15 


May 15 


Gain 


2.38 


Vord Meaning 


16 


20% 


t 


3.10 


Paragraph Meaning 


26 


32 




2.36 


Arithmetic Computation 


15 


20 


5 


2.40 


Arithmetic Concepts 


12 


16 


4 


2.32 


Arithmetic Applications 


13 


17% 


A% 




INTERMEDIATE 


II: GRADE 6 










Raw Score Norm fori/ 


Expected 


as Reported 


Test 


October 15 


May 15 


Gain 


2.73 


Word Meaning 


25% 


29% 


4 


3.22 


Paragraph Meaning 


35 


40% 




2.41 


Arithmetic Computation - 


18% 


23 




2.51 


Arithmetic Concepts 


14 


18 




2.50 


Arithmetic Applications 


19 


- 23 


4 



1/ Values read and interpolated from raw score-grade score tables 
" given in accessory materials and handacorlng booklets. 



ERIC 



-57- 

«0 



N.H. Statewide Testing Program Evaluation - VIIB 



Table VII-B-4 
New Hampshire Statewide Testing Program 1969-70 
Fall and Spring Raw Score Means, Standard Deviations and Gains 
Random Sample and Title I Cases 

Grade 4 

Part A - Random Sample 



Test 

yord Mean. 
Para. Mean. 
Arith. Comp. 
Arith. Cone. 
Arith. Appl. 
N - 585 



Word Mean. 
Para. Mean. 
Arith. Comp. 
Arith. Cone. 
Arith. Appl. 
N » 434 





Raw Score 




Raw 


Score 


No. of 


Means 




Standard Dev. 


Items 


Fall Spring 


Gain 


Fall 


Spring 


38 


15.92 21.87 


5.95 


7.10 


7.26 


60 


24. A4 31.97 


7.53 


9.43 


10.50 


39 


11.46 18,34 


6.88 


4.47 


6.97 


32 


12.88 16.37 


3.49 


5.20 


6.06 


33 


12.93 16.21 


3.28 


5.07 


6.21 


Part B - All Title 


I Cases 






38 


9.13 13.21 


4.08 


4.98 


6.11 


60 


16.35 20.85 


4.50 


5.97 


7.93 


39 


9.94 14.46 


4.52 


4.03 


6.28 


32 


9.37 11.71 


2.34 


4.41 


5.17 


33 


8.89 10.88 


1.99 


4.25 


5.53 



Grade 6 



Word Mean. 
Para. Mean. 
Arith. Comp. 
Arith. Cone. 
Arith. Appl. 
N » 645 



Vord Mean. 
Para. Mean. 
Arith. Comp. 
Arith. Cone< 
Arith. Appl. 
N « 235 





Part A 


- Random 


Sample 






48 


25.67 


30.07 


4.40 


8.49 


8.22 


64 


32.91 


38.31 


5.40 


11.23 


12.21 


39 


13.91 


18.48 


4.57 


5.45 


7.37 


32 


13.52 


16.53 


3.01 


5.10 


6.49 


39 


17.57 


19.59 


2.02 


6.77 


7.75 


Part B - All Title 


I Cases 






48 


17.45 


22.23 


4.78 


7.32 


8.22 


64 


22.56 


28.16 


5.60 


9.17 


10.60 


39 


11.67 


14.18 


2.51 


5.30 


6.49 


32 


10.36 


12.70 


2.34 


4.50 


5.61 


39 


13.30 


14.61 


1.31 


5.86 


6.00 



ERIC 



61 

-58- 



N.H. Statewide Testing Program Evaluation - VIIB 



Table VII-B-5 

A Comparison of Fall and Spring Gains 
Involving the Median for Title I versus the 
25th Percentile for the Random Sample 



Grade 4 
SAT: Intermediate I 

RANDOM SAMPLE 

RAW SCORE TITLE I 

25th PERCENTILE RAW SCORE MEDIAN 

Fall Spring Gain Fall Spring Gain 



Word 

Meaning 10 16 6 
Para. 

Meaning 17 24 7 



Arlth. 
Comp. 

Arlth. 
Cone. 

Arlth. 
Appl. 



8 13 



11 



11 



8 12 4 
15 19 4 

9 13 4 
8 10 2 
8 10 2 



Grade 6 
SAT: Intermediate II 



RANDOM SAMPLE 
RAW SCORE 



TITLE I 



25 th PERCENTILE RAW SCORE MEDIAN 
Fall Spring Gain Fall Spring Gain 



Word 

Meaning 19 25 
Para. 

Meaning 25 29 
Arlth. 

Comp. 10 12 



6 
4 



Arlth. 
Cone. 



11 



Arlth. 

Appl. 12 13 



16 21 

20 26 

10 12 

9 11 

12 13 



ERLC 



62 

-59- 



N.H. Statewide Testing Program Evaluation VIIIA 



(In order to highlight this coMparison, the pert- 
inent information has been extracted from the 
above tables and is reproduced separately in 
Table VII«B-*5.) Perhaps It is not unreasonable to 
say that the 25th percentile for the random sample 
constitutes a better goal for children in Title I 
than the state median doe's* 'For example , in Grade 
4 of the Random Sample the Pall 25th percentile 
rank for Word Keaning is 10 points while the Pall 
■cdian for Title I is 8 points. In Paragraph 
Meaning the Pall 25th percentile rank for the 
Random Sample (Grade 4) ia 17 coar|>ared to the 
Title I median of 15 points. In Arithmetic Comp- 
utation the Pall median for Title I is 9 and the 
25th percentile rank for the Random Sample is 8. 
This comparison can be carried out too rigorously 
but it merely suggests a line of inquiry to the 
reader who examines and analyzes these data for 
himself. 



In Table VII -I-l, comparable Metropolitan 
grade equivalents also are given. These were ob- 
tained from a table of equated grade equivalents 
for Metropolitan and Stanford provided by the pub- 
lisher. By using the SUnford grade equivalent as 
the entry figure, it is possible to see what this 
grade equivalent vould be in terms of Metropolitan 
*70 norms. 

According to these new Metropolitan norms » 
New Hampshire children are doing definitely better 
than the national group was doing at the time Met- 
ropolitan was standardised in all subjects at the 
Grade 4 level and in Grade 6 in all but Concepts 
(normal gain) and Applicationa (deficit of .2 year). 

Metropolitan norma have the advantage of be- 
ing much more up-to-date than are Stanford norms 
kut the interesting fact suggested by this analysis 
is that the nejt differences from fsll to spring do 
not seem to be very different with one or two ex- 
ceptions, ^us, it was harder to "make the grade*' 
with Stanford norma but the net gains in grade 
equivalents from fall to spring aren*t very dif- 
ferent except in Computation in Grade 6 where Met- 
ropolitan norma seem to reflect a national down- 
ward trend. New Hampshire children aake a net 
gain of 5 tenths according to Metropoliun norms 
while being Juat at grade on Stanford norms. 



Summary 

The picture arising out of the use of Metro- 
politan values stated to be equivalent to earned 
SUnford values is far more favorable to the State 
and in general must be said to be far more closely 
in line vith what would b« expected in terms of 
the mental ability of the children tested. Inci- 
dentally , it coincides far mora close ly» too, with 
previous results on former annual statewide test- 
ing programs at the 8th grade level where Mev Hamp- 
shire eonaistantly has been at or near the nation- 
al norm. 



SECTION vni 

Bivariate Comparison of Pa 11 -Spring Performance 
for the Random Sample and for Title I Cases 

Part A 

Bivariate Distributions as a Means of 
Comparing Pall and Spring Test Results 

Whenever a test or series of tests is given 
at one period of time and repeated at a subsequent 
period, it is possible, of course, to study the 
stability of performance of the group as a whole 
in terms of the correlation between the variables, 
and at the same time, to study the extent to which 
sn individuel differs in his performance from the 
first period to the last by locating that indivi- 
dual on the bivariate distribution surface for the 
two variables. To put this in simpler language, 
it is possible to meke a plot with the first test 
on one axis and the second test on the other axis, 
and from this plot work out the correlation coef- 
ficient giving the relationship between the two 
measures. The bivariate plot is not ^ necessary 
step to the computation of the correlation; it ia 
more like a light to guide the aware person from 
accepting a statistically foolish result too often 
occurring due to hidden computational errors and 
to identify clusters of scores identifying pupils 
needing further checking. 

There are certain coiiditions underlying the 
computation of Pearson product mouent correlations 
relating to similarity of the shape of distribu- 
tion on the two variables, etc., that are highly 
technical and need not be considered here. It suf- 
fices to say "for the record" that the relation- 
ship must be linear. 

In this sttidy the bivariate plots *«ere made 
after the scores had been reduced to stanine form. 
It then was possible to see clearly the general 
level of relationship between the first test and 
the second. «In the context of this study, this 
relationship is between the Fall test and tlie 
Spring test results separately by subtest. Since 
stanines are noniallsed standard scores, always 
having a mean of 5 and a standard deviation of 2 
for scales based on the same population , such bi- 
variates are especially easy to study. 1./ 

The writer has determlnad empirically that the 
percent of cases falling within the mid-stanine 
range (a band three atanines wide running from 
lower left to upper right) will correspond almost 
exactly to the correlation coefficient if the sta- 
nines for the two variables are computed on the 
same group* The exception noted above makes this 
only approximately true for these distributions. 
^ The further significance of this finding about the 
relationship of the correlation coefficient and 
percent in the mid-stanine band will be discussed 
at a later point. 

1/ There is a slight exception to the qualifica- 
tion "same population" in this study. Pall sta- 
nines are besed on the total state group tested at 
each grade level; spring stanines are necessarily 
based on the performance of the random sample. 



ERLC 



N.H. Statewide Testing Program Evaluation - VIIIA 
Statistical Correlation as a Process 

When one studies the relationship between any 
two sets of data, using the Pearson product moxnent 
correlation procedure, the technique itself trans- 
forms the scores into standard scores with a mean 
of 0 and a standard deviation of 1. If raw scores 

^^^^^ transformed to standard scores such as 
stanines with the same mean and standard deviation, 
the correlation plot will be generally symnetrical 
and a line drawn on a bivariate chart from lower 
left to upper right will bisect the correlation 
surface. Normality of the distribution is not in- 
volved; symmetry is» The plotting of regression 
lines, i.e., lines drawn through the means of the 
arrays, will reflect the magnitude of the relation- 
ship existing between the two tests. If the cor- 
relation is plotted in terms of raw scores on the 
same test given twice, the difference between the 
means is roughly a ne^.sure of the average gain in 
raw scores that has been made by the group from 
the first testing period to the second. .This does 
not mean that all individuals should have or even 
could show an equal gain. 

No measure of gain is obvious when transformed 
scores, such as stanines, are used if these trans- 
formations are computed on the basis of scaling 
done independently for the two test administrations 
on the same group* 

A child earning the same stanine both times 
has progressed as expected. An upward shift on the 
second test means an acceleration in his relative 
position in the group; a downward shift means less 
progress than would be true i£ he moved ahead at 
his expected rate. 

When the tests being compared or correlated 
have been administered seven months apart > one must 
seek strenuously to find logical and persuasive 
reasons why some individuals perform poorly in one 
test and well in the other regardless of the order 
in which this difference occurs. Some part of the 
difference of course will be random error but not 
all and not more than would be true if the teats 
were administered within a short tine span. Some 
part %rill be bias, i.e., changes resulting from 
identifiable causes. All identifiable factors in- 
fluencing the performance of individuals must be 
diligently sought. Such differences as can be 
attributed to known influences are not assignable 
to the error of measurement! 

For this situation Stsnford subtest scores for 
fall versus spring constitute the paired scores. 
Stanines were independently derived for fall and 
spring administrations, Tlte position of any indi- 
vidual on any chart will reflect what has happened 
to that individual during the interim period but 
only in the sense that any change in his stanine 
means an upward or downward shift in his relative 
position in the group. Maintenance of hit original 
status simply means that he has learned at the same 
rate as others like himself in ability. 

It would be possible to study absolute gains 
in terns of standard scores only if the second test 
score is interpreted in terns of the standard 

ERLC 



scores assigned on the basis of the standard score 
transformation obtained from the distribution of 
the scores on the first te: -. This might have been 
done in the case of the New Hampshire data and, in 
some ways, it might have been more instructive than 
the procedure that was followed, ll Instead, as 
stated earlier, stanine comparisons are made in 
terms of fall stanines for the toUl state group 
and spring stanines for the random sample. The 
assignment of spring stanines on the basis of the 
random (representative) sample scores was a neces- 
sary condition; it was basic to the whole idea of 
testing the random (representative) sample in the 
spring in order to provide some reasonable way of 
comparing fall-spring performance. 

Knowledge of the existence and significance of 
the regression effect for all individuals in the 
bivariate distribution except those at the mean 
further helps one in his attempt to make sense in 
the interpretation of paired comparisons* 
"Regression" is the name for a phenomenon widely 
ignored or misunderstood, namely, the tendency for 
high first measures to be lower on the second 
measure on a comparable instrument and vice versa* 
Tall parents have tall children but not as tall as 
as they are and vice versa. This effect is always 
present. Low scoring individuals tend to improve 
on a second test Just by chance; high scorers tend 
to fall back* Only when the shift is greater than 
can reasonably be accounted for by chance can one 
be sure the shift is due to a systematic influence* 

A little exercise of common sense after chas- 
ing down the protocols fo.r deviant individuals so 
as to study the performance of these individuals 
from one time to another often can turn up the 
logical reasons why a particular performance was 
80 atypical* 

For example, perhaps on one test answer sheet 
the narka were not sufficiently heavy for the op- 
tical scanner to pick them up satisfactorily, 
while on another test the marks were quite read- 
able. This would, of course, invalidate the two 
test comparisons for that individual. 

Perhaps on one occssion the individual might 
have guessed substantially, narking every item on 
the test "with his eyes closed", so to speak, af- 
ter he had done as much a£ he could do in terms of 
his own knowledge. This guessing factor on a test 
made, up of four or five multiple choice questions 
would substantially raise his score and therefore 
his stanine placement. If, on the second test, he 
had sufficient self-confidence or he had been 
taught in the neantime that testing is supposed 
t to be a true communication act calling for truth- 
ful responses of a non-chance nature and that he 
only does himself harm by guessing, his score the 
second time might sctually be lower than it was 
the first but would be more truly reflective of 
his status in the group. 

With these thoughts in mind, it will be most 
helpful and provocative to study the following bi- 
variate charts. (Saa Charts VIII-1 to VIII-10.) 

1/ A separate study is under consideration. 

64 



N\H- Statewide Testing Program Evaluation - VIIIA 





"g 












to 








eg. 






V o 






> 






















c 


(0 M 






^ Of 






« 4J 






0) iJ 












M & 






















tA 


















c 






0 












4J 


V E 




<0 


••^ M 








i— t 




O U 


1 


& 




M 






M 


V 


■2 


M 






> 


iJ 


0 




60 


c 


U 


C 


« 


S 


wl 


St 


o 


o 








•o 




CO 


V 






iJ 




ID 


o 




iJ 


V 




M 


(-4 






V 
CO 






O 












O 1 




U 0> 




« 








to 0% 




U 






« 






> 


3 M 






«B « 






















k: 








c o 




c 


o 




« 


U Jl 




iJ 


0. u 




CO 


to CO 



C 



0)1 

Of 



00 

c 

•I-I 
M 
O. 
CO 



2 

1 




f-i 




C4 
CM 


CM 

m 


CM 

o 


«-H 


O 




oo 

CM 




\ 




















CO 


CM 


\ 
















cn 








x 


CM 












r% 


(0 






CM 


V. 

— Is 


CM 


CM 










ID 






f-H 


o 

f-H 


1 

\ 


o 

1 


in 


«-H 




c*> 
cn 












o 




00 

— 1 


r% 






10 












CM 


. _i 


00 

1 


CM 

CM 


CM 
CM 
«-4 


CM 










CM 


«-H 


00 

«-H 






CM 

m 










CM 


CM 


o 

CM 


vO 


«-H 

cn 




r-l 




0> 


CO 






in 






CM 







I I 

c c 

o o 
1-1 

« to 
•o 

CM flS CO 

in *o *q 

• G a 

I CO CO 



• • 

C*> CM 
I I 

C 

8* 

X 00 

^ M 

<0 0* 

CO 



11 »i 



00 

c 
e 



00 

c 

M 
P. 

CO 



11 


f-H 

CM 






in 
o 


M 
f-H 


CM 
O 

f-H 


in 

VP 


CO 


f-H 


CO 
00 

in 






o 

f-H 


oo 


f-H 


f-H 










SP 
CM 


CO 










CO 










f-H 

CO 


N- 




f-H 
f-H 




X 


CM 
CM 


CM 
f-H 


CO 








00 
00 


0) 




rv 


f-H 


— 

X 


CM 
CM 


o 

f-H 


f-H 




f-H 


in 


lO 




CM 


CM 
t-H 


CM 


\ 

i 


in 

CM 


in 


CM 


«-H 


f-H 


5r 




CM 


CO 


a\ 

r>4 


OO 
CM 




in 

f-H 




f-H 


o 








f-H 


CO 


OO 


in 

CM 


X 


so 

f-H 






CM 








f-H 


f-H 




CM 
f-H 


V 

i 


in 


CM 










f-H 


CM 


CO 


00 


o 

f-H 




OO 
CM 




0> 


CO 


N 


0) 


ID 




lO 


(M 







a\ o 
« • 

^ CM 
I I 

c c 
o o 

<^ tH 
CO «I ^ 

*0 ^ 'O 
U U G 
» flS «I i 

(0 « V 

u a 

CO CO 1-1 

c 

> m 

0 CO 

• - I 

in •© 

iH 

1 I X 

S5 



Ou CO 



11 »i 



ERLC 



65 

-«2- 



N.H. Statewide Testing Program Evaluation - VIIIA 







C 


X 














« 


c 


rra 










o 








> 








« 




























c 


(0 


u 










0) 








to 








:^ 




u 


















& 








Su 


M 






a 














0) 






X 


c 


4J 






CO 




CO 






C 


g 








o 




•o 






•H 


> 


















« 












0) 




1 




u 










c 










M 




1-4 


0) 'O 






> 




u 










o 
















U 


CO 


c 




« 




c 


CO 








iJ 








o 
















CO 












o 




























ria 


Se 








o 




o 








o 


1 










0^ 






« 




>0 








CO 


0^ 






u 




i-l 








r-^ 












U 
















PQ 


Re 








0) 










C 










c 


o 








o 










X 








o. 


u 








CO 


CO 





c 

c 
m 

U 

£ 



X 



to 
c 

c 
« 

to 
u 
to 
« 
u 
« 



60 

c 

CO 



1 




\o 


CO 

iM 


r>. 


o 
o 


o 


o 

00 


•if 


O 

CO 


0) 




















f-l 


00 




*\ 


f-l 














f-l 














f-l 


f-l 






00 










5; 


<^ 










o> 


in 






i-i 


CO 


X 




m 




CM 


r>. 
<r 










tn 


i-i 


NO 

^ 


o 

CM 


a> 


<r 


f-l 
r>. 


•o 






i-« 




o 

CM 






f-l 

CM 


f-l 


f-l 
<n 
f-l 














CO 

f-l 


CM 
CM 


la 


CM 
f-l 


«-i 
r>. 










»-i 




CM 


tn 

CM 


O 

CM 




f-l 






GO 




0) 


in 




ro 


(VJ 







I a 

c c 
o o 

f-l f-l 

CO CO 

f-l f-l 
> > 

p p 

CO (0 

•g-g 

« CO 



CM m 

fO CM 

I I 

ti 
CO 

c o 
« X 

0) 

r-* f-l 
f-l 1-1 

CO Cl 



II 



11 


CM 


vO 
CO 


C*> 


o\ 


\o 

CM 


f-l 


in 


CM 

<r 


f-l 

CM 


in 






m 




f-l 


f-l 










CM 


00 


m 


\f-i 


CO 

f-l 


<r 


f-l 


f-l 


f-l 




f-l 


t> 
CO 




CO 


m 
f-l 




00 

f-l 






<r 


f-l 


f-l 


in 




CM 


CO 


CM 


CV»»^ 


CM 
CM 


CO 


CO 


<r 




in 
o\ 


in 




f-l 


m 


CM 




o 

CM 


r-l 
f-l 


CM 


CM 


\o 

O 

f-j 








m 


CO 
f-l 


CO 




f-l 


00 




o 
f-l 

r* . 


ro 








in 


ft 


CM 




f-l 


00 


00 


CVJ 




f-l 




CM 


m 


in 




r — 

V 

-] 


SO 












f-l 




in 


CO 






CO 
CM 






00 




iO 


m 




fO 


OJ 







o> o 

• • 

f-i CM 

I I 

c c 
o o 



f-l f-ir>. 
> > • 

•0*0*0 

u u c 

CO (0 CO 

■g-s" 

u u C 
to 

c 

CO 

CO O ♦J 
O O CO 
• • I 

in 

f^ 

I I X 

c c 

CO 

«o Xr* 

X 60 
w 

f-l fM 
CO U* 

U4 CO 



11 »a 



ERLC 



66 

-63- 



N.H. Statewide Testing Program Evaluation - VIIIA 



u 

s 



i-l 01 o 

> fr* 
m f-i 

(U 4J 01 

« « 4J 

> « 4J 
4J 4J « 

« JO n 

JO 9 

CO M 

a 

•H 4J 01 

X C 4J 

M O CO 

C B 

0 a> 

> M 

4J 01 a 

0) U 4J 

c 

4J O 

t)0 c 
B « 

a 

CO 01 
4J 

» o 

4J tl 

u ^ 

u o 

U IV 

01 O I 
4J IM 0» 
« \0 

^ m o> 

b 4J t-l 

> 9 M 
f-i n « 

0) 

B 6pff-4 
«H O 
B <H O 
« ^ J3 
4J O. U 
CO 03 CO 




0) 

« 



B 

o 




u 
< 



to 

B 

o. 

CO 



11 




m 


CM 




CM 
O 
•H 


CM 




vO 


00 
CM 


CM 




\ 

N 

\ 


CM 
















m 


00 




\ 

>? 

\ 


CO 




CM 






1-4 










cn 


\ 


CO 


1-4 
•H 




CO 


•H 


•H 


CM 




CM 


1-4 




\ 


1-4 
•H 




CO 


m 






m 


CO 




00 


00 


' 




00 


1-4 
1-4 




m 










CM 
1-4 


IV 

CM 




1-4 

CM 


CO 
p>4 


CO 


o 

p>4 


lO 




CM 


CM 




00 







IV 
p>4 


s 


IV 

00 


CNJ 






•H 


CO 


m 


m 


O 

1-4 


^ 

X. 




CM 

CO 










m 


p>4 








X 


00 

m 




0> 


00 






tn 






(VJ 







c^ ON 



I I 



B B 
O O 



Q «Q 

> > 

CO Q 9 

^ •O •O 

• B B 

OS OS 

1 CO CO 

14 

o 

CM 0\ 

• m 

^ CO 

I I 

B 

B 0) 

« £ 

9) 

X M 

B 

•-4 v4 

•-4 M 

0] O* 

U4 CO 



11 «i 



to 

B 

«r4 

a 

CO 



II 


lA 
CM 


•H 

CO 


m 

00 


CM 

IV 


CM 

\o 


00 

m 


m 

IV 


f-4 

m 


CM 


CO 
00 

m 






vO 


CO 


1-4 






f-4 






m 

CM 


00 






s 






f-4 


f-4 


f-4 




o 

CO 




00 


00 


X 


CM 
1-4 


vO 
f-4 


vO 






f-4 


OS 

IV 






IV 


CM 


\ 


m 

CO 


00 


CM 
f-4 






IV 

o 

f-4 




CM 


•o 


#-4 
«"l 


;a 


:^ 


s 


o> 


o> 




o^ 


^ 


1-4 




00 


CO 
CM 


CO 
CO 


^ 


CM 
CM 


f-4 
f-4 


SO 


ON 
f-4 
f-4 


lO 








f-4 


CM 






o> 


00 


00 
so 


CM 






f-4 




IV 


CM 


vO 


K 

V 


CM 


IV 

CM 










CM 


CO 


IV 


IV 


vO 


\ 


CO 




0> 


00 


fv 




m 




lO 


CNJ 







11 



o 

CM 

I 

B 
O 
<H 
4J 
Q 

> 

^ 01 
• B 



Q 0\ 

•H in 
> • 

•g-g 

9 0) 

c: 
«o 

00 4J 
0> CO 

• I 

1 X 

B C 

w 
a. 

w 



ERIC 



67 



N.H. Statewide Testing Program Evaluation - VlllA 



" c E 

•-4 01 O 

> tX4 

Pm O 

C » ^ 

0) U 4> 

Q) n u 

> V u 

u u m 

W M 

a 

X C u 

09 0)10 

c e -H 

O 0) 13 

ifI > qj 

iJ 0) p 

« c 

^ f-4 01 

I 0) U 4i 

M OS < c 



0) •o 

^ >4 

ii o 
60 d 

c le 

•r4 U 

O 

U 
(0 u 
U 0) 
M f-4 

5 0) 
M 

U O 
U 

0) O I 
U %4 OS 

ce 

f-l OS 
M U f-4 
CO 

> 3 U 

f-l 0) « 

A 0) 0) 
OS 

0) 

C 60f-4 

ifI C O 

c ••-< o 
ce M X 

U CL U 

www 



u 

2 

u 



0) 
CQ 

o 



a. 

Q) 

u 

- § 



u 
< 



o 

§ 



«i 

a 

V 

u 
c 
o 
o 



60 

c 

v4 

a 
w 



II 




CM 


o 

CM 




f-4 


so 

f-4 

•H 


GO 






m 


0) 












f-4 










00 






CM 


f-4 


f-4 










m 




f-4 








f-4 


CM 








f-4 




f-4 




OS 




OS 


1-4 




f-4 






m 






CM 


f-4 


^ 




O 

f-4 


lO 


CM 


00 












OS 




CM 


vO 


00 




fO 






CM 


fO 


o 

f-4 


o 


' 

X 




vO 
CM 


111 


CM 






f-4 


m 




fl-4 

CM 


fl-4 


\ 


o 

CM 


m 






f-4 




f-4 


CM 


fl-4 


00 


OH 


« 


m 




cn 


00 






m 




fO 


CNJ 









00 








f-4 f-4 




1 1 




c c 




o o 




V4 Kl 




U U 




(0 CO 




v4 v4 






















n 


IS IS 


in 




• 






5 S 


1 


W CO 








CM 








• • 








1 1 




d 




^ IS 




C 0) 




« X 




0) 












f-4 ^4 




f-4 k 




«o a. 




tU CO 



60 

c 

v4 

a 

CO 



il 


f-4 
CM 


00 


vO 


vO 

ON 


O 

f-4 


OS 

f-4 


00 
vO 


CM 


OS 
CM 


f-4 

00 

m 


cn 


^ 


vO 




CM 












vO 
CM 


00 






OS 




f-4 


Sf 












CM 


f-4 


^ 

X 


OS 

f-4 




CM 


fl-4 






m 




f-4 


m 


m 

CM 


^\ 




fl-4 


f-4 
f-4 


CM 




00 
CM 
f-4 








O 

f-4 


\o 

f-4 


'x 


OS 


OH 




fn 


vO 

o 

f-4 








f-4 


rH 
rH 


f-4 
CM 


1* 


o 
fl-l 




m 


r>. 

00 


fO 


f-4 




f-4 




f-4 


CM 
CM 






CM 
f-4 


00 


CJ 








f-4 




as 


fl-4 

f-4 




vO 


m 
m 
















SO 




1 


CM 
CM 




0) 


00 






in 


5r 


lO 


CVJ 







I I 



c c 
o o 



CO 

•r4 -H 

> > • 

0) V' 



s s s 

CO CO 

c 

IS 

rH *J 

ov O w 

• • > 

I I X 

5C 60 

M 

f-4 •r4 

IS a 

PU CO 



68 



ERIC 



-65- 



N.H. Statewide Testing Program Svaluatlon - VIIIA 



« o 
m 1-1 

« 44 a> 



c 
o 

u 
m 

o 



a 



12 



u 

s 



W M 

a 

X c w 

U (0 

c e •p4 

O 0) 

^ ;> Qj 

«^ « B 

Q) O U 



00 

c 
1-1 

I 

b 

« 

5 



O 

C 
m 

•8 

o 

0) 

to 

o 

O • 

so 
tt OS 



^1 

m 

u 

o 



• IB 



00 
6 
1-1 

u 
a 

CO 



11 






00 


CM 

<n 


o 

so 


OS 
OS 


o 

OS 


m 


00 

m 


OS 
CM 

<r 


0> 


V 

. ^ 


















CM 


CO 




\ 






rH 




rH 




















CM 










U> 










oc^ 












en 


lO 








m 


X 


O 
CM 


SO 


CM 










CM 




so 


rH 


'X, 


CM 
CM 


so 




o 

rH 


lO 










00 


m 
i-i 


X 

— :^ 


en 

rH 

^ ' 


SO 


OS 

m 










^ 


rs. 


m 

rH 


rH 

CM 


. N 


<r 


o 

00 








rH 




so 


SO 
•-4 


CM 


m 
or 


< 


o 

OS 




<J> 


CD 1 N 




in 




lO 









00 



I I 

c c 

o o 

1^ i4 
4J 4i 
CO CO 
1^ <H 

> > 

■s-s 

<n CO « 

« <0 

4J 4J 
• CO CO 

u 

m 00 

^ CM 



I I 

_ « 
d « 

CO o* 

CO 



5 s 



„ o 

(O CO 



c 

o 



a 
a 
< 



1 1 



11 


o 

CM 


:| 


so 


OS 

o 


rs 
O 


rs. 
m 


SO 


00 

m 


rs. 

rH 


OS 
rs. 
m 




\ 


OS 




SO 


rH 










00 
CM 


00 




\ 


so 


SO 


CM 


rH 


rH 






CM 




CM 


1-4 


^ 


SO 
CM 


ps 


s* 


rH 






rs 
rs. 




f-4 


I-I 






m 
en 


m 


SO 


rH 


rH 


s* 

rH 
rH 


m 




1-4 


<n 


SO 
i-i 




m 
en 


o 


rs. 


rH 


OS 
OS 






rH 




CO 


CM 
CM 


X 


rs 


so 




rs 

s 


lO 








rs. 


SO 


o 

CM 




00 


<n 




CM 






rH 


CM 


SO 


O 
rH 


rH 


^ — 


*n 


CM 








f-4 






00 


SO 


rs 


'^ 


OS 
CM 




0> 


CO 


K 


a> 1 m 




lO 


CM 




MfMIt 



OS O 

•-3 cm' 

I I 

c c 
o o 

1-1 i4 

« « o 

i4 -rl IN. 
> > . 

O PI 



m CO < « 

so •g rt 

CO 4 0) 
I CO CO ^ 

OS O CO 
• • I 

s^ tn T3 

■ ■£ 



11 'i 



ERIC 



69 



N.H. Statewide Testing Program Evaluation - VIIU 



c 



Q> O 

> tu 

- & 

^ •» 

o 



s t 



4i 0) « 
« 14 E 
■-4 J3 C 



M 
C 

O 



« 
• 

5 



^« . 

•O M vol 
O « 

c « 

« M 
4J CP 
CO 

•O 

4) 
4J 
O 
« 

CO 

o 
o • 

NO 
« 0\ 



« • 

OS 

Ml-* 
C O 
•H O 

M. O 

CO (O 



a 

CO 



11 


"1 


<M 


CO 




CTV 
«M 


so 

fO 


so 




m 

CO 


1-4 

CO 

ol 


a> 


^ 


1-4 




1-4 












CO 


00 


1-4 


r 

*\ 


1-4 














CM 




1-4 


1-4 


— 
\ 




«M 


«M 








OS 








1-4 


\ 

^ 


SO 


1-4 


«M 




1-4 


CM 


lO 










1 

V. 




CO 






00 
1-4 










CO 


sO 




r-l 






CO 












in 


o 

1-4 




1-4 


r> 


m 
so 


M 












00 


CO 
1-4 


^ 

X 


00 


:| 
















SO 


o 

1-4 




m 

CO 




0) 


00 




(0 


in 




to 


<M 







SO 



I I 



o o 



14 

> > 
53 

I CO to 



in 00 

^ CM 
CO CO 

I I 

0 

1-4 M 

« a 

U4 CO 



11 »a 



u 



00 

c 

u 
a 

CO 



11 


CM 


o 


CO 
CO 


1-4 

o 

1-4 


CO 


.,1 

OS 


1-4 
00 


m 


so 
CM 


t-4 

SO 


0> 




OS 




1-4 




1-4 








1-4 
CO 


00 




\ 

\ 


OS 


CO 


1-4 










CM 






r> 

1-4 


:^ 


m 


00 










GO 

<^ 






m 


CM 
CM 




o 

CO 


a 


m 




1-4 


CO 

o 

1-4 


to 






m 


m 

CO 


:^ 


OS 

1-4 


so 


1-4 




00 
1-4 
t-4 








CM 


o 

1-4 


00 
CO 


X 


so 

CM 


m 


1-4 


SO 

1-4 
1-4 


lO 








1-4 


CM 
1-4 


CM 


\ 




m 


rs. 
r% 


CM 










CM 


m 


so 


- > 


00 


c^ 














1-4 








CO 
CM 




0) 


00 




(0 








(VJ 







0 o 

• • 

CM CM 

1 I 



n «fo 

CO « « 2 



00 



338 

I CO CO «^ 

<^ 00 4i 

OS Os CO 
• • I 

0 c 
- ••^ 

0 « 

« XH 
«» 

^5 

•-4 M 

« a 

to CO 



11 M 



ERIC 



70 



-67- 



N»H» Statewide Testing Program Evaluation - VIIIA 



e 

0) 

0) 
jO 

a 

JC 

tt 

e 
o 



5x 

eg 
« o 

« & 

« 

3 

CO M 
M 

c « 

It: 

> 
o 



u 

s 



to I 
e < 

» C 

o 

JC 1 
CO < 



(0 o 

4i « 
U tH 

5 0) 

o o 
u 

0) o > 

4J U-i 

« «o 

U U 
* iH 
3 ^ 
«H n CO 

c o 
c v< o 
« Wi JC 

4J a u 
CO w w 



^1 

0) 



60 



2 



11 


CO 








00 
CM 


CO 


o 


CO 

tn 


CM 


CO 
CM 


0) 


^: — 

i 


















CM 


CO 




: 

\ 
















CM 






























CO 


\ 












vO 


in 






fH 




X 


fH 




r-l 




tn 

CO 












o 


\ 

J 






in 


a> 
•cr 



















CM 
CM 




tn 


CM 












en 


en 

f-4 


^ 

— :i 




CM 
















o 






X) 
CM 




<n 


CO 




iO 


in 




?o 


CM 







60 

e 

« 
X 



Vi 
60 

U 

& 



60 

e 

Vi 

o. 

CO 



t 


o 
en 






00 


CM 

f-i 


en 
o 
f-l 


CM 
00 


CM 


CM 


CM 


0) 


— 


in 


in 


CM 


iH 










CM 


00 




\ 

\ 


O 


00 












cn 






«n 

CM 


X 


en 


CM 










00 






vo 


CM 


X 


O 

en 


<t 
f-l 


cn 






<r 

f-l 


in 








in 




CM 


f-l 
f-l 




f-l 


— 

cn 
f-l 












tn 
en 


cm. 

^ 


CM 


cn 


f-l 


8 

f-l 


fO 






CO 






CM 
CM 


^: — 

X 

— :i 


o 

CM 


00 




CM 














f-l 




J 


00 


tn 
<r 












CM 




tn 




I 


CM 






00 




<0 






fO 


CM 







c c 

o o 

«<-l «<-l 

u u 

m <o 

«<-i «<-i 

> > 

33 

I CO CO 

u 

cn tn 

CM 



« X 

^« 

fH ^ 
^ CO 



CM fH 



o o 
« « o 

^^2 



00 
« « «i 

*i 4J S 

CO CO 5 

' CM 5 

m •* ^ 
« X t4 

f-4 

« o- 



11 »a 



EMC 



71 

-68- 



N.H. Statewide Testing Program Evaluation - VIIIA 





« c 


X 












Q 




" c 








o 










« ^ 








«• 




C (0 


c 




4) 4J 






4) n 


4J 




> 4) 






V 4J 


« 




0) ^ 












CO 


M 






M 
























m 




O 4) 






•H > 


•o 




4J 4) 


Qi 




« -rj 


E 


00 






9 


4) o 




M 


OS < 




M 


c 


M 


4) ?0 


M 


> 


*J C 










Char 


harts Showing 
Selected Stan 








o 






1^ 




4) o 


1 












\o 




•rl CO 






U 4J 


t-l 
















•rl CO 


« 










Pi 


>* 














•r^ C 


0 




C .j^ 


o 




CO Wi 






4J a 


u 




CO CO 


CO 



"0 
CO 

u 
c 



c 

o 

4J 
CO 

8 
o 



4J 



8 



I 



ll 


CO 


0^ 




CO 
CM 


On 
CM 


in 




0^ 


CM 


<o 




— 

\. 




1-1 














CM 


00 
























f-i 


in 


— ^ 


in 




1-1 


1-1 






\o 

1-1 


iO 


f-i 


t-i 









in 




CM 




CO 
CM 


lO 






00 


in 


;\ 


ON 


in 


CM 




00 
CO 






1-1 






CM 


^CM 


o 
1-1 




CO 


1-1 


ro 






1-1 


CO 


00 


CO 
iH 


\ 


O 
iH 


in 


in 


CM 








1-1 




o> 


o> 


tl 

^5 




1-1 

CO 












CM 


CO 


in 




\ 


\o 

CM 






(0 






ID 




ro 


CVJ 







O 00 
CM 

I I 

c c 

o o 

«H «H 
«9 CO 
> > 

Pi 



■S'S 

CD CO 

CO CO 

I CO CO 



in 



11 »a 



u 

•rl 
4J 



60 
•H 

CO 



s I 



u 
< 



t 

J 


1-1 

CO 


CO 
CO 


o\ 

00 


co 

1-4 
1-4 


o 
o 
1-1 


in 

i-C 


o 


CO 


ON 
CM 


SO 

■$ 


(n 


i 


00 


m 


CM 












in 

CM 


00 


1-1 
1-1 


J 


1-1 
1-1 


in 


1-1 


CM 








CO 




in 


00 


T 


CM 
CM 


1-1 
1-1 




CO 


1-1 




CM 

00 


iO 


in 


in 


00 
CM 




o\ 
rM 


so 
1-1 




1-1 


1-1 


o 

CM 
1-1 






CO 


CM 
1-1 


1-1 

CO 






in 
1-1 






in 

CM 
rH 






CM 


SO 


so 
1-1 


o\ 
1-1 


X 


CO 

1-1 


o\ 


SO 


ON 


lO 






CM 


CO 


00 


in 





s 


o\ 


ON 
ON 


CJ 




1-1 






* in 


o 
1-1 


o\ 


— 


in 












CM 


1-1 


in 






K 


CM 




0) 


00 






lO 




lO 


(M 







CO 

I I 



c 

CO 

c 

f-* U 
CO D« 
tu CO 



I I 

c c 

o o 

•H 

CD COCO 

> > • 

API 

CO ci - 



I 



«0 CSO) 
CO CO*H 

O Q0 4J 
O ONCO 
• • I 

in 

•H 

I IX 

^ •'»-' 

«^ il • 

r-4 •rl 
^ CO 



72 

ERIC 



N.H. Statewide Testing Program Evaluation - VIIU 



" c e 

i-l « o 
c: n w 

0> U 0) 

a; « Ai 
7 V u 

4J 4J « 

« ^ c4 

to 1 



5 



60 

e 

o 

CO 



^ i 

u 

u 
o 
<l-l 
c 
n 



^1 

Of 

U 
C3 



CO 

o o 

« O • 
ii C7V 
« 

> P u 

•n d o 
d O 
mux: 

U) CO CO 



a 

CO 



11 




CO 


o 


o 


o 


so 
CO 


f-l 
in 


f-l 

CO 


f-l 

CO 


SO 
CO 




i 




















CO 




\ 

N 
























— 

— i 


m 


r4 










CO 

f-l 




f-i 




CO 




so 




f-l 




f-l 




lO 










X 




so 






CO 


^ 


i-i 








CM 


X 

— 


f-l 


in 


CO 


















^ 

— :i 


m 




CO 
CO 


CM 








f-i 






f-l 
f-l 




— :j 




CO 












f-l 








r 

\ 


m 






CO 




iO 


m 






(VI 







Cu 

u 
n 

o 



1-1 
u 
a 

CO 



u 
< 



11 


so 


f-l 




sO 
f-l 


CM 
f-l 


00 


f-l 
CTV 




CM 


m 

;2 






CO 


rv 














o 

CO 


CO 








CO 


f-l 










o\ 

CM 






CO 

f-l 




CM 
CM 


f-l 


CM 


f-l 






SO 

r». 




f-l 


in 


in 

C4 


— 


f-l 
CO 


CM 
f-l 


in 


f-l 




CM 
f-l 






*^ 


in 
f-l 


CM 

CO 


^. 

— i 


CM 


CM 
f-l 


CM 


f-l 


CM 
f-l 








CO 


CO 
f-l 


GO 
CM 


^ 


O 


f-l 
f-l 


GO 


^0 
f-l 
f-l 


lO 








CO 




f-l 
f-l 




GO 




CO 
SO 


CM 








CO 


in 


O 
f-l 


m 
f-l 


\ 

J 




m 










CM 






GO 


GO 




so 

CM 




0> 


CD 




(0 


m 




to 


(Vi 







I I 

c c 
o o 



> > 

V V 
u w 

4J 4J 
CO CO 



GO 
• • 

CO CO 

I I 

c 
« 

X 



r-l W 
CO 0« 

CO 



0 9V 
%S f-l 

1 I 

c c 
o o 



« (0 f-l 

""i-l IS. 



Q O I 

•p TJ 
w w c: 
, « « flj 

« « 0) 

4J 4J C 
CO CO v4 

c 

> 00 so 5 
CTV CO 
• • I 
^ «^ 

I ■£ 

^ ^'^ 

X g 

f-l 1-1 
f-l u 

< a 

»M CO 



ERiC 



73 

-70- 



N.H. Statewide Testing Program Evaluation - VIIIA 



u 

s 



G X 

QJ O 

> U4 

O 

>^ 

*J CO 

^ A 

W M 



« « 4J 

C E a 

O « 

4J QJ Al 



60 

c 

CO 



J 



»4 



ll 


CM 


CM 


iH 


00 


5 


in 


m 
en 


00 
CM 


cn 


CM 




^: — 

\ 


















O 


00 




c 

\ 
















CM 






CM 


N 
















iO 






00 


^: — 
\ 




cn 


iH 






CO 
CM 


in 






CM 


00 


X 




^0 


CM 




O 
St 








iH 


iH 


in 




CO 


OV 


cn 


00 
so 


K> 








iH 


00 


o 


ST 




CO 
iH 




CNi 








cn 


f-4 


m 


CM 




\ 

i 




<t 

CM 










CO 


f-4 




m 


St 




CO 
CM 




0> 


00 




U> 


in 




10 


(M 







00 so 



I ■ 

c c 
o o 

CO <a 

> > 
•2"S 

v£> 'O t3 
• C S 

23 

I CO CO 



CO 

00 r>. 



C 
« X 

PK4 CO 



O V 

c « 

CO 

•o 

Qi 
4J 

U 

QJ 
f-» 

0) 
CO 

O 

u 

o t 
VM (^ 

CO CTV 



;) u 

CO CO 

OH >* 

C O 

1-1 O 

u x: 

CO CO 



< 



11 


00 
CM 


r-J 
CO 


o\ 


CO 
CO 


OS 

in 
f-l 


St 
f-4 
f-l 


CO 
St 


St 

m 


00 
CM 


CM 

t 


<n 




o 

1-4 


St 














f-4 
CM 


CO 


CM 

f-t 


"X 

i 


St 

f-l 


CO 


f-4 










St 




00 






f-l 
f-l 


OS 


f-l 








sD 


0) 




CO 


CO 
CO 


c 

X 


in 

CO 


CO 
1-4 


f-4 


CM 




in 

CM 


in 


f-l 




CM 
f-l 


St 
f-4 


X 


o 

CM 








1-4 








CO 


00 


CM 
St 


\ 


OS 


m 
f-l 


00 


00 
CO 
r-l 


to 








CO 


St 
f-l 


St 
f-l 




CM 
f-l 




so 


cvi 








St 


St 


o 
f-l 


CO 


^ 

V. 


St 


<t 
CO 










CM 


CO 


CO 




OS 





CO 
CO 




0^ 


00 




(0 


in 




10 


(M 







o o 

« «St 

•r4 1-1 rs 
> > . 

0) 41 
Q P I 

^ H c 

^ CO « a 

^^ •Q po 

• c S 

« « QJ 

4J 4-1 C 
I CO CO 1-1 

g 

CM \£> 4J 
O CO 
• • t 

in St •o 
1-1 

£ ^ 

s 

a g> 

1-4 1-1 

f-< M 

« P« 

^ CO 



71 

O -71- 

ERIC 



N.U. Statewide Testing Program Evaluation - VIIIE 



SECTION VIII 
Part B 

Data Concerning the Measures of Relationship 
Between Tests Administered in the Fall 
And Repeated in the Spring 

tmmediately preceding this discussion the wri- 
ter has made an attempt to deal with a few of the 
issues involved in the interpretation of correla- 
tion coefficients. Much more could be said but 
even these cautionary notes may be considered by 
some readers to be superfluous* Essentially, the 
task of making sense out of correlation coeffi- 
cients calls for a level of statistical competence 
and sophistication that probably does not charac- 
terize more than a small fraction of the people in 
public education at both administrative and in- 
structional levels. 

Few people, for example, are aware of the fact 
that reliability coefficients can be unduly infla- 
ted by drawing a sample that is as heterogeneous 
as possible* As a matter of fact, the population 
samples used for determining reliability coeffi- 
cients for the Stanford consisting of 1000-case 
random samples from the standardization group prob- 
ably are about as variable as any group could be 
and remain within a grade. Sensible reliability 
coefficients are computed on veil described com*- 
munlty samples so that the values obtained will be 
descriptive of the local scene. Since we have no 
daca oa reliability for the comminlLres within this 
State, we are including in Table VItI-B-1 the 
split-half reliability coefficients for Stanford 
subtests used in this study as reported in the 
Technical Manual. These values almost surely over- 
estimate the tests' reliability in the context in 
which they are used» but lacking something better, 
they will have to serve the purpose. 

In the next adjacent column, the correlations 
between fall and spring tests are reported for each 
of the five Stanford tests consistently studied in 
this report. The correlations indicate the rela- 
tionship between the Stanford subtests given in 
October of *69 and repeated in May of '70. The 
data are giver separately for Grade 4 and Grade 6 
and for the random (representative) sample and 
Title I. Notice that the correlations are consis- 
tently lower in Grade 4 than In Grade 6. The wri- 
ter knows of no systematic and underlying cause for 
the difference in the magnitude of these values. 

Possibly the subtest scores for the Intermediate I 
Battery were less normally distributed than those 
for the Intermediate II Battery which was used at 
Grade 6. Possibly these tests were more relevant 
to the instruction in the 6th grade than at the 
4th grade. Or perhaps the bias in the two grade 
populations is a sufficient explanation. This in- 
consistency is only an illustration of the fact 
that correlation coefficients are not self-inter- 
preting statistics with a coanon meaning regardless 
of the situation within which they %fere obtained. 
All correlations for the random saisple are lower 
by a subsuntial aargiii than the reported relia- 



bility coefficients. This is really not unexpect- 
ed since the reported reliabilities are instrument 
reliabilities (pupil variation controlled) and 
contrasted to the test-rett -t comparisons. 

When one moves to similar values for*Title I, 
it is interesting to note that in every instance, 
with one possible exception in Grade 6, the cor- 
relations are clearly lower than for the random 
sample and lower in the spring than in the fall. 
Above all else, this reflects the fact that the 
Title I group is substantially less variable than 
the random sample. However, we would like to think 
that some part of the lowering of the correlations 
actually is due to something that happened to the 
children during the period from October to May. If 
pupils have validly diagnosed remedial defects and 
if they are provided with adequate instruction to 
counteract specifically the defined and described 
defects, the net result would be to lower the cor- 
relation between their first testing and their 
second due to the fact that the amount of gain or 
improvement under special instruction will vary 
widely among individuals, depending in large meas- 
ure on the extent or magnitude of their difficulty 
in the first pladi, and the effectiveness of the 
special instruction. For example, children known 
to have a correctible reading deficiency based 
upon adequate diagnosis will improve greatly over 
a relatively short period of time, but certainly 
not in a manner consistent from individual to in- 
dividual since this depends upon the nature of the 
defect in the first place and the adequacy of the 
instruction to correct it in the second place. 

Tables VIII-B-2 and 3 reproduce eight corre- 
lation matrices showing the Intercorrelatlons of 
the Stanford subtests separately for the random 
sample and for the Title I cases and separately 
for Fall and Spring for Grades 4 and 6. These 
eight matrices are interesting Indeed to study but 
are only a basis for speculation without knowing 
a lot more about what took place between the Fall 
testing program and the Spring follow-up program. 
We must assume that the program of instruction for 
the children in the random sample was Just about 
normal or typical of what goes on ordinarily. That 
being the case, it Is rather clear that even here 
Influences and fl'ctors are at work which tend to 
dilute the degree of agreement between a series of 
tests taken two at s time. 

Perhaps this a good place to leave to those 
who wish to speculate as to the significance of 
these mathematical coefficients what their signif- 
icance in truth may be and turn, for the benefit 
of those who are perhaps more visually minded and 
less statistically oriented, to a consideration of 
the small bivarlate distributions set up in terms 
of stanines vhich sake more evident the nature of 
the relationship between Fall and Spring results 
separately by tests and separately for the random 
sample compared to the Title I group. 

The first series of bivarlate charts relates 
to Grade 4 and each test's bivarlate chart for the 
random sample is paired with the corresponding bl- 



ERLC 



N,H, Statewide Testing Program Evaluation - VIIIB 



varUte chart for that test for Title 1 children. 

As one goes fro« chart to chart, It Is evident 
that there are sone maverick cases i: almost every 
chart where It seems quite unlikely that the pro- 
tocols» I.e., that there Indlvldaal test results, 
wert valid in both Inscances. For example. In the 
Paragraph Meaning blvarlate chart for the random 
sample. It Is evident that a child who earned a 
stanlne of 8 in the Fall would be most unlikely to 
earn a valid stanlne of only 2 when retested in 
the Spring. Ever? peculiar case of this kind, 
falling far out from the general cluster of scores 
should have been investigated case by case . 

However the general practice is MTP EVEN TO 
HAVE THE /JiSVffiR IXKIDMEKTS RETURNED. The least 
valuable part of the total information obUined 
by testing is given complete primacy. Having item 
analysis information is no help. Having a chart 
showing how every item wms answred is better but 
none of these helps one learn why some pupils act- 
ed erratically and none permits the pupil to share 
adequately with the teacher his areas of strength 
and weakness. 

These data are available for further study 
since the answer sheets or scoreable booklets were 
returned and have been stored in the hope that 
funds could be aade available for making this kind 
of detailed inspection for the sake of what it 
might show up by wsy of insights into the dynamics 
of testing and thus improve future programs. 

Generally, however, the fraquencies are clus- 
tered, nore or less synetrically, centered around 
the mid-stanine range or band. It must be remem- 
bered that in these bivarUtas» Fall stanines were 
based upon the total sample of children tested 
throughout the state ranging from ll,700f in Grade 
6 to 12,00Of in Qrade 4 because these stanines 
were already on the tape. SUnines based upon the 
representative sample might have been preferable 
but the others already were in the hands of the 
local schools. The Spring stanines, on the other 
hand, are based, of necessity, upon the perform- 
ance of the tested random sample. Theae stanines 
were used, of course, to interpret the results for 
Title I cases also. 

The correlation coefficients are given at the 
bottom of each of the blvarlate charts and for the 
random smq^ only, the percent of children in the 
mid-stanine band Is also reported. It will be ob- 
served that this percentage it very similar to the 
correlation coefficient and if both sets of data. 
Spring versus Fall, had been based upon the same 
stanlne trana format ion, these percentages would be 
even closer. Similar percentages ere not given 
for Title I because the sUniaes ware not indepen- 
deatcly derived for that sample and the advanUges 
of this comparison with r would have been lost. 

Generally spaaking, ther^ la a curtailment in 
the distributilm for Title I which shows up by a 
thinning o£ the scattarplot la the upper right- 
hand comer. This la most avidaat 1£ one looka at 



ERIC 



the marginal figures and notes that the distribu- 
tions are skewed in the sense that there are fewer 
cases in the upper ranges for Title I than for the 
random sample which are more or less synsnetrical • 

Since the stanines for Fall and the stanines 
for Spring 'Jcre computed independently , one cannot 
observe growth directly by comparing the Fall and 
Spring perfonmnce. This point has been discussed 
earlier in the introductory portions of this sec- 
tion. It will be noted, however, that the random 
sample means closely approximate 5 and the random 
sample standard deviations closely approximate 2 
in every blvarlate for both grades. On the other 
hand, the means for the Title I sample tend to be 
substantially lo%ier. especially in the Spring and 
while the standard deviations vary, they also tend 
to be somewhat smaller than those for the random 
sample. If 

Considering Just the Title I pupils, it is 
evident that the relationship ,i.e., correlation, 
is far from 1,00 between the Fall and Spring data 
but it is also evident that it would be possible 
to identify children falling substantially outside 
the mid-stanine range who should have been inves- 
tigated pupil by pupil if these data had been re- 
ported promptly enough to the schools. The ideal 
arrangement would have been to have the answer 
sheets for all children returned to each school 
and for someone to undertake the Usk at the local 
level of examining suspect answer sheets in terms 
of the Fall-Spring paired responses to see which 
responses failed to be consistent from one testing 
program to another. This would have been espe- 
cially helpful in this instance since even the 
roP^ MS used. 

The bivariates deserving nost serious study 
are those relating to Paragraph Meaning since this 
was the curriculum area where major emphasis was 
put in the Title I program. However, in doing so. 
please remeniber that these data describe aljL Title 
I children, not Just those in reading programs. A 
separate analysis will be prepared as a supplement 
to this report at a later date analyzing the data 
in a somewhat similar fashion for those who were 
in remedial reading programs. 

Blvarlate charts of this sort take on their 
greatest imporUnce as a basis for helping a 
teacher, administrator, or supervisor to identify 
individual casek and study them against the back- 
ground of the performance of the group as a whole. 
An isolated case is hard to interpret; a case in a 
defined and charted distribution is laore easily 
studied and tmderatood. For this reason, it is 
also most helpful to have the data for the typical 
or random saiq>le for comparison %fith the specifi- 
cally designated Title I cases. It is also pos- 
sible that even the sophisticated will have a re- 
nawsd sense of what a correlation really means if 
thoia who bother to read the report take a good 
look at the charts. 

1/ Sea Mia mz-i-4. 

76 



.H* Statewide Testing Program Evaluation - VIIIB 



With the presentation of the data on the.se bi- 
variate charts, the statistical portions of this 
report are completed* It just renains, therefore, 
to sum up and to provide the reader with the wri- 
ter's own evaluation of the total program in tenrs 
of an overview of all of the data available. Ob- 
viously, this is a highly subjective process and 
disagreement as to the significance of these data 
can be expected* Every individual in a position 
of responsibility must perform this tedious task 
of studying the data for himself. It has been 
this writer's intention within the sadly lacking 
basic data to select and highlight those parts 
that to him seemed most significant. 



Table VIII-B-1 

Correlations* Between Selected Stanford Subtests 
Administered in the ?alX and Repeated in the Spring 
In Comparison with Repor:«'.d Reliability Coefficients 



Grade 


4 - 1969 


-70 




SAT: 


Int. I: 


X 








Random 








Sample 


Title I 


Word Meaning 


.90 


.77 


•56 


Paragraph Meaning 


.92 


.69 


.43 


Arithmetic Computation 


.89 


.AO 


.46 


Arithmetic Concepts 


.86 


.71 


.57 


Arithmetic Applications 


.86 — 


.68 


•57 



Grade 


6 - 1:69 


-70 




SAT: 


Int. II: 


X 








Random 








Sample 


Title I 


Word Meaning 


.90 


.83 


.75 


Paragraph Meaning 


.93 


.82 


.72 


Arithmetic Computation 


.89 


.69 


.70 


Arithmetic Concepts 


.85 


.76 


.69 


Arithmetic Applications 


.89 


.79 


.63 



* Based on raw scores. 

^ Corrected split half reliability coefficient as reported 
by the publisher, based on random samples of 1,000 cases 
per grade fron the standardized sample. 



77 

-74- 



M.H* Statewide Testing Prograa Evaluation - VIIIB 



Table VIII-B-2 

Intercorrelations* of Selected Stanford Subtests 
for Random Sample and for Title I Separately Te&ted 
. Fall and Spring 

Grade 4 : FALL 
SAT: Intermediate I 
A B 
RANDOM SAMPLE TITLE I 



Test Name 
and Number 


1 


2 


6 


7 8 


Teat Name 
and Number 




1 


2 


6 


7 


8 


Ubrd Hng. 1 


1.00 








V6rd Mng, 


1 


1.00 










Bars. Mng. 2 


.72 


1.00 






?ara. Mng. 


2 


.56 


1.00 








Arith* Comp. 6 


.29 


.38 


1.00 




Arith. Comp. 


6 


.23 


.33 


1.00 






Arith* Cone. 7 


.54 


.57 


.48 


1.00 


Arith. Cone. 


7 


.37 


.48 


.46 


1.00 




Arith. Appl. 8 


.52 


.53 


.47 


.72 1.00 

Grade 6> 
SAT: Interi 


Arith. Appl. 

: FALL 
Mdiate II 


8 


.39 


.44 
D 


.35 


.53 


1.00 



c 

lAMDOM SAMPLg TITLK I 



Test NssK 
and Numi>er 




1 


2 


5 


6 


7 


Teat Name 
and Number 




1 


2 


5 


6 


7 


Vord Msg. 


1 


1.00 










Word Mng. 


1 


1.00 










Para. Mng. 


2 


.80 


1.00 








Eara. Mng. 


2 


.76 


1.00 








Arith. Comp. 


5 


.47 


.52 


1.00 






Arith. Comp. 


5 


.44 


.54 


1.00 






Arith. Cone. 


6 


.62 


.61 


.59 


1.00 




Arith. Cone. 


6 


.59 


.57 


.57 


1.00 




Arith. Appl. 


7 


•65 


.66 


.62 


.76 


1.00 


' Arith. Appl. 


7 


.58 


.63 


.57 


.68 


1.00 



i^Tbeae eorrelationa, which are baaed on rav scorea, tend to run 2 to 3 poincs 
lower than the atanine correlaticma reporter vith the bivariatea, due to coarse- 
neaa of grouping in the case of ataninea. 




N.H. Statewide Testing Program Evaluation - VIIIB 



Table VIII-B-3 

Intercorrelatlons of Selected Stanford Subtests 
for Random Sample and for Title I Separately Tested 
Fall and Spring 



Grade 4 : SPRING 
SAT: Internedlate I 



RANDOM SAMPLE 



B 

TITLE I 



Test Name 
and Number 




I 


2 


6 


7 


8 


Test Name 
and Number 




1 


2 


6 


7 


Word Mng. 


1 


1.00 










Word Mng. 


1 


1.00 








Para. Mng. 


2 


.77 


1.00 








Para. Mng. 


2 


.61 


1.00 






Arith. Comp. 


6 


.38 


.47 


1.00 






Arith. Comp. 


6 


.33 


.42 


1.00 




Arith. Cone. 


7 


.60 


.62 


.38 


1. 00 




Arith. Cone. 


7 


.42 


.53 


.54 


1.00 


Arith. Appl. 


8 


.58 


.66 




.75 


1.00 


Arith. Appl. 


8 


.46 


.56 


.56 


.69 



Grade 6 : SPRING 
SAT: Intermediate II 



Te.st Name 
and Number 



Word Mng. 1 

Para. Mng. 2 

Arith. Comp. 5 

Arith. Cone. 6 

Arith. Appl. 7 



RANDOM SAMFIX 

1 2 5 6 7 
1.00 

.80 1.00 

.48 .61 1.00 

.62 .70 .71 1.00 

.64 .71 .67 .79 1.00 



Test Name 
and Number 

Word Mng. 

Para. Mng. 

Arith. Comp. 

Arith. Cone. 



Arith. Appl. 7 



D 

Tin£ I 

1 2 5 6 7 
1.00 

.73 1.00 

.44 .52 1.00 

.56 .59 .64 1.00 

.61 .62 .62 .73 l.OO 



PRir 



,79 

-76- 



N.H. Statewide Testing Program Evaluation - VIIIB 



SAT Int. 
Subtest 



Table VIII-B-4 

Means, Standard Deviations » and Correlation Coefficients 
for the Random Sample and Title I Cases, Fall versus Spring 



RANDOM SAMPLE 



Mean 



Standard 
Deviation 



Grade 4 



Mean 



TITLE I 
Standard 
Deviation 



Word Meaning 
Spring 
Fall 

Difference 



4.96 
5.05 
-.09 



2.0 
1.9 



.75 
M - 583 



Word Meaning 
Spring 
Fall 

Difference 



2.71 
3.17 
-.46 



1.5 
1.6 



N » 



.52 
428 



Para. Meaning 
Spring 
Fall 

Diffe rence 



5.00 
5.03 
-.03 



2.0 
1.9 



.67 



N - 584 



Para. Meaning 
Spring 
Fall 

Difference 



2.95 
3.32 
-.37 



1.5 
1.4 



.37 
430 



Arlth. Comp. 
Spring 
Fall 

Difference 



4.98 
4.96 
.02 



2.0 
2.0 



.56 



N - 583 



Arith. Comp. 
Spring 
Fall 

Difference 



3.90 
4.24 
-.34 



1.9 
1.9 



.43 

429 



Arith. Cone. 

Spring 
Fall 

Difference 



5.01 
4.93 
.08 



1.9 
1.9 



.68 
N - 581 



Arith. Cone. 
Spring 
Fall 

Difference 



3.52 
3.57 
-.05 



1.7 
1.8 



N - 



.53 
435 



Arith. Appl. 
Spring 
Fall 

Difference 



5.01 
4.95 
.06 



2.0 
1.9 



.65 
N - 579 



Arith. Appl. 
Spring 
Fall 

Difference 



3.28 
3.45 
-.17 



1.8 
1.7 



.53 
429 



SAT: Int. II 

Word Meaning 
Spring 
Fall 

Difference 



4.98 
4.99 
-.01 



2.0 
2.0 



Grade 6 



.83 



N - 641 



Word Meaning 
Spring 
Fall 

Difference 



3.28 
3.15 
.13 



1.7 
1.6 



.74 

231 



Para. Meaning 
Spring 
Fall 

Difference 



4.99 
5.02 
-.03 



1.9 
2.0 



.79 
H - 642 



Para* Meaning 
Spring 
Fall 

Difference 



3.45 
3.23 
.22 



1.6 
1.6 



.71 

237 



Arith. Comp. 
Spring 
Fall 

Difference 



4.98 
5.00 
-.02 



1.9 
1.9 



.67 
M - 646 



Arith. Comp. 
Spring 
Fall 

Difference 



3.84 
4.15 
-.31 



1.8 
2.0 



.64 

235 



Arith. Cone. 
Spring 
Fall 

Difference 



4.96 
4.98 
-.02 



1.9 
2.0 



.73 
N - 645 



Arith. Cone. 
Spring 
Fall 

Difference 



3.81 
3.77 
.04 



1.7 
1.9 



.65 

236 



Arith. Appl. 
Spring 
Fall 

Difference 



4.96 
5.02 
-.06 



1.9 
1.9 



.74 
N - 642 



Arith. Appl. 
Spring 
Fall 

Difference 



3.79 
3.83 
-.04 



1.6 
1.8 



.61 

234 



ERIC 



80 

-77- 



N*H* Statewide Testing Program Evaluation * IX 



SECTION DC 

A Persona 1 Coamentary 

X feel it quite necessary at this point to 
evaluate this report and especially all that has 
led ap to It in terms of what lessons it may have 
taught as well as what it '^proves" about Title I 
programs in New Hampshire and, by implfcation» 
elsewhere* 

I. The process has been too time-consuming by 
many months* This has resulted because of lack of 
coordination at all levels and between all agen- 
cies involved. To a very large extent this was 
inevitable £t first as we groped our way toward a 
configuration that would answer our questions 
about the effectiveness of Title I programs and 
still deal realistically with the mensuration 
problems involved* Before -after testing with in- 
struments built for a different purpose has a 
built-in **bomb" in the reality of errors of meas- 
urement enhanced when two fallible measures are 
compared over a seven-month time span* 

There are no precedents to follow, and any 
shallow or superficial analysis using inappropri- 
ate samr/ling statistics will neither reveal the 
inherert dangers nor provide insightful sugges- 
tions ifor the future* 

My own conclusion is that the variations in 
individual pupil performance noted on our tests 
are quite as much due to the pupil's built-in day- 
by-day variability plus the inefficiencies of our 
present educational process as they are to the in- 
struments themselves* 

I have tried to bring this out in several 
places in tha text* Better instruments are need- 
ed, to be sure, but no instrumentation no natter 
how good, will nullify variability built into the 
situation , not the tests* 

2* New Hampshire is a remarkably typical 
state as determined by national norms* The use of 
the MAT equivalence tables to re -interpret Stan- 
ford data reinforces the Otis-Lennon data in es- 
tablishing this conclusion which I have repeatedly 
observed over the last twenty years* 

3» The tested random sample was quite repre- 
sentative* We were luckier than we deserved 1 

4* The Title I cases for whom data were avail- 
able reinforced many observed characteristics of 
children in the stipulated socio-economic strata* 
However, I think it only fair to note that there 
are few target schools as such in New Hampshire 
outside a few of the large communities* Many of 
these failed to test in any case. Thus Title I 
help may have been extended to needful children 
not necessarily from economically deprived hones. 
This is good in a state where the legislature ap- 
parently cares so little for the welfare of our 
children. 



Anong such groups (as delineated by the avail- 
able Title I data); 

a* Boys fall behind girls most of the time in 
many measurable ways* 

b* Learning ability of the Title I samples is 
lower thzLu average by significant amounts 
and this is reflected in school perform- 
ance. Grade 6 sample is better than Grade 
4 and generally Grade 6 end-of-year per- 
formance is relatively better than Grade 4 
at the same time period* 

It profits us nothing to argue about in- 
herited vs environmentally derived cogni- 
tive skills* These kids need special rec- 
ognition, special instructional materials, 
more individual attention including out- 
side supportive services by way of mental 
health and L ading clinics, more love and 
affection and thus less sense of failure 
and self -deprecation* 

Obviously these are personal opinions, but 
they are borne out by experience and com- 
mon sense and are consistent with these 
data* 

c* Relatively few of the Title I cases show 
clear signs of having correctible defects 
in learning skills compared to being just 
a slow moving group* Our data were not an- 
alyzed to maximize the chances of discov- 
ering such disabilities since, in this in- 
itial report all tested Title I cases are 
combined* 

More effort should be expended to search 
out these children with special cognitive 
learning blockings by adequate diagnoses 
and to provide the kind of corrective in- 
struction that might be called the "pre- 
scription education" to emphasize its rel- 
evancy to individual needs* 

The U.S* Office of Education should plan more 
carefully to stimulate these two kinds of efforts , 
l*e*» better instruction for slow learners with 
reasonable goals; and diagnostic identification 
and remediation for the educationally handicapped* 
The nonsensical specification that all be brought 
up to grade, whatever that means, should be buried 
enough so it shall not confuse the issue any long- 
er. 

Packaged, debugged programs, thoroughly field 
tested » should be recommended by the USOE with 
especial care for the pupil accounting aspects of 
the program* Provision should be made for cumula- 
tive data files. Ko danger of the pygmalion phe- 
nomenon (in reverse) need worry us if teachers 
will learn to see children as individuals in a 
competitive society that had Just better realize 
that it takes all kinds of people to make the 
world go around. The 8elf*-fulf illing prophecy no- 



id 

ERLC 



-78- 

81 



N.H. Statewide Testing Program Evaluation - IX 



tion is an insult to the intelligence as well as 
the good intentions of the teaching staff. Given 
half an opportunity, there are few time-tested 
teachers (those staying in the profession because 
chey love children) wh-) will not welcome new ways, 
new ^ids, and new >--i-jns co enable them to help 
all children learn. 

In all this "evaluation" effort the child is 
almost forgotten. Bas he been informed as to why 
we test - and retest? Has he even seen the re- 
sults of his efforts in situ in terms of questions 
answered or not answered correctly? Has the no- 
tion of testing as one way he can get across to 
his teacher just what he does and does not know 
ever been thought of by the teacher to say nothing 
of having been coiamunicated to him? If so, little 
evidence o£ this has reached this level of activ- 
ity. 

Holt's philosophy in "Why Children Fail" is 
largely beside the point. "Crisis in the Class- 
room" comes much closer to the truth. 

Not enough has been said about evaluating 
performance project by project. The problem in 
New Hampshire is the small size of the local ad- 
ministrative units and thus the small N's one has 
to deal with. 

Comparison of distributions in terms of cen- 
tral tendency statistics or percentiles is not 
satisfactory. The best plan would appear, at this 
motnenc in time, to be a pupil-by-pupil evaluation 
where pupils would be studied against before-after 
bivariates of the most relevant tests. The "most 
relevant test" in reading might be a standardized 
battery, but might constitute selected material 
from such a battery or two forma of the reading 
test given within the same week and repeated at 
the end of the period of instruction. 

The greatest chance to show dramatic changes 
is to work in an area such as math where individ- 
ual item changes might be observed. The curricu- 
lutr valid material selected would be that suitable 
for the Title I group and not Just the test ordin- 
arily used at a particular grade. To use the no- 
menclature comcTionly used in this report, the 
spring test answer document might be the scoring 
key for the fall docuoent. The scoring might in- 
volve a multi-step procedure to determine: 



a. the number and character of identical 
items answered consistently. 

b. the number going from wrong to right and 
vice versa. 

c. the percent of the group answering select* 
ed items considered needful of mastery as 
an hierarchical step-up toward an eventual 
goal of mastery of essential skills. 

In math at any grade or developmental level 
one must start with demonstrated skill in number 
manipulation (computation). This must precede 
much attention to problem solving since competency 
in computation sets a go-no go limit on applica- 
tions involving the specific skills. A careful 
study of the item analysis information from the 
Metropolitan Math tests recently released will re- 
veal woeful lack of skill in number manipulation 
at any developmental level. Any nonsense about 
computation skills not being netessary in this 
day and age because of the advent of computers is 
sheerest irrelevancy. Problem solving will always 
be necessary and number manipulation is its pre- 
requiaite. 

Not enough attention has been paid to dis- 
covery of the item types that are most suitable in 
before-after comparisons. The writer could fill 
another report on this subject, but will content 
himself with one generalization - co wit, item 
types relatively much freer of guessing than pres- 
ent item forms are absolutely essential and are in 
our grasp if the local groups demand such tests. 

These are merely avenues for exploration, but 
at least they are not the fruitless blind pecking 
activity that best describes the continued reli- 
ance on total score comparisons, especially when 
fallacious methods of interpreting (e.g. grade 
equivalents) are applied to the data. 

My last word 

Title T programs should not be bandaids on a 
bad bruise, but preventive education that never 
lets the bruising situation occur because insight- 
ful, r ponsive, and ingenious people are trying 
hard with some special help to meet the needs of 
each child* Perhaps there should be a Title II 
program for the highly advantaged who, in most 
school situations, are equally put upon! 



82"- 



N.H. Statewide Testing Program Evaluation * Appendix 



Appendix A 

New Hampshire Statewide Testing Program 
Intercorrelations of Stanford Achievenent Test and 
Otis-Lennon Mental Ability Test 
Fall - 1968 

Grade 4 
SAT: Intermediate I, Form X 
OUIAT: Elementary II, Form J 



Test Name 






















10 


and Number 




1 


2 


3 


4 


5 


6 


7 


8 


9 


Word Meaning 


1 


1.00 




















Para. Meaning 


'> 


.75 


1.00 


















Spelling 


3 


.66 


.68 


1.00 
















Word Study Sk 


.4 


.64 


.62 


.65 


1.00 














Language 


5 


.70 


.71 


.69 


.72 


1.00 












Arith. Comp. 


6 


.35 


.42 


.40 


.41 


.47 


1.00 










Arith. Cone. 


7 


.55 


.59 


.49 


.57 


.62 


.51 


1.00 








Arith. Appl. 


8 


.57 


.61 


.50 


.58 


.62 


.49 


.71 


1.00 






Social Stud. 


9 


.68 


.70 


.56 


.61 


.67 


.38 


.64 


.66 


1.00 




Science 


10 


.74 


.74 


.62 


.64 


.70 


.37 


.60 


.64 


.75 


1.00 


Comp. Prog. 


11 


.82 


.87 


.71 


.73 


.80 


.58 


.78 


.80 


.83 


.61 


0-L MAT 


12 


.76 


.76 


.66 


.70 


.75 


.44 


.68 


.69 


.74 


.76 


I.Q. 


13 


.72 


.71 


.63 


.68 


.73 


.42 


.64 


.65 


.68 


.71 



11 



12 



13 



1.00 
.91 1.00 
.89 .94 1.00 



Grade 6 
SAT: Intermediate II, Form X 
OlMAT: Elementary II, Form K 



Test Name 


















8 




and Number 




1 


2 


3 


4 


5 


6 


7 


9 


Word Meaning 


1 


1.00 


















Para. Meaning 


2 


.80 


1.00 
















Spelling 


3 


.64 


.68 


1.00 














Language 


4 


.71 


.76 


.69 


1.00 


1.00 










Arith. Comp. 


5 


.42 


.50 


.48 


.55 










Arith. Cone. 


6 


.60 


.63 


.50 


.66 


.60 


1.00 








Arith. Appl. 


7 


.62 


.67 


.53 


.68 


.60 


.77 


1.00 






Social Stud. 


8 


.73 


.77 


.55 


.73 


.49 


.70 


.74 


1.00 




Science 


9 


.74 


.78 


.53 


.70 


.43 


.62 


.67 


.79 


1.00 


Comp. Prog. 


11 


.bh 


.89 


.68 


.83 


.66 


.81 


.84 


.86 


.80 


O-L MAT 


12 


.74 


.76 


.61 


.78 


.51 


.68 


.72 


.75 


.73 


I.Q. 


13 


.71 


.74 


.60 


.76 


.51 


.68 


.68 


.72 


.68 



11 



12 



13 



1.00 
.90 
.90 



.00 
.94 



1.00 



ERLC 



83 



N.H. Statewide Testing Program Evaluation - Appendix 



Technical Handbook 



Appendix B 



TAtU 29 

Corr«latir« MwMn OtMjtfinon aiMi 
/lanford Achiwtmf K TMt 



OtivUnnon MAT 



Stanford Achltvemtnt Ttst: 1^64 Edition* 



Grade 



Ltvtl 



Raw Score 
DIQ 



Laval 



Subtest 



r?aw Score 



Mean 



S.O. 



Mean 



S.D. 



407 



580 



Prim.lt 



ElemJ 



ei9 



Elem. II 



607 



inter. 



11 



•4 



43.13 
103.6A 



54.93 

103.78 



51.56 
103.85 



4546 
10534 



53.75 
103.52 



45.86 
101.92 



6.74 
14.23 



11.00 
13.57 



14.06 
14.77 



14.80 
14.20 



15.17 
13.2/ 



1331 
12.27 



-1 



Prlm.t Word Reading 21.33 6.48 

Piragra^.^ Meaning 20.81 8.92 

Vocabulary 21.79 6.31 

Spelling 11.41 5.46 

Word Study Skills 36.58 a37 

Arithnnetic 38.09 11.93 

Prim. II Word Mer,ning 24.89 5.42 

Paragraph) Me^ining 47.28 9.54 

Science-Social Studies 23.81 535 

Spelling 20.58 6.52 

Word Study Skitis 4&29 11.25 

Language 47.53 931 

Arith. Computation 37.18 9.53 

Arttti. Concepto 2933 837 

Tfiler. II Word Meaning 24.26 8.52 

Paragraph Meaning 3335 11.30 

Spelling 28.15 9.48 

Unguage 85.13 18.12 

Arith. Computation 16.07 6.20 

Arith. Concepts 1438 5.57 

Anth. Applications 17.94 6.89 

Social Studies 39.43 12.26 

Science 31.56 9.71 

Adv. Paragraph Meaning 32.01 1135 

Spelling 2a51 12.05 

Unguase 94.15 17.05 

Arfth. Computation 19.06 7.79 

Arith. Concepts ia27 7.25 

Arlth. Applications 14.02 436 

Social Studiet 4639 12.52 

Science 33.91 933 

H.^ Englhh 4731 16.29 

Numerical Competence 27.03 7.97 

Mathematics 1947 6.63 

rfiMlIng^ 31^1 11.10 

»:ience % Z 32]b5 1 9.44 

^IStl^^ 28JB8 ; , 7.77 

%%imm < \ 27.92 ^ ^.95 

H.S. Eiflish ^ \ 53.73 13.95 

Nftfmffical Ofmpetence 30.12 9.09 

MaUi^matics' 23.60 a74 

^••dinf 37j» 9,a8 

8den^« 35.05 d32 

Sodefstiidles 33.44 0^2 

f:ielllng 32.89 l\io 



From the Otifli^nnon Mental Ability Test TechnicAl Handbook 
Reproduced by peraiaaion of the publiah^r 



.52 
.47 
.62 
.42 
.54 
.57 

.62 
.60 
.56 
M 
.57 
39 
.50 
.^57 

.7.' 
.78 
.62 
.78 
.60 
.73 
.75 
.74 
.75 

30 
.63 
30 
.67 
.74 
.67 
30 
.70 

.83 
.79 
.7tt 
.83 
.74 



1 1\ 



.79 
.79 

32 
.68 

.74t, 
.53 , 



ERJC 



84 

-81- 



